1
|
Sehsah AI, Mousa A, Farouk G. A hybrid variational autoencoder and WGAN with gradient penalty for tertiary protein structure generation. Sci Rep 2025; 15:14191. [PMID: 40268976 DOI: 10.1038/s41598-025-94747-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Accepted: 03/17/2025] [Indexed: 04/25/2025] Open
Abstract
Elucidating the tertiary structure of proteins is important for understanding their functions and interactions. While deep neural networks have advanced the prediction of a protein's native structure from its amino acid sequence, the focus on a single-structure view limits understanding of the dynamic nature of protein molecules. Acquiring a multi-structure view of protein molecules remains a broader challenge in computational structural biology. Alternative representations, such as distance matrices, offer a compact and effective way to explore and generate realistic tertiary protein structures. This paper presents TP-VWGAN, a hybrid model to improve the realism of generating distance matrix representations of tertiary protein structures. The model integrates the probabilistic representation learning of the Variational Autoencoder (VAE) with the realistic data generation strength of the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP). The main modification of TP-VWGAN is incorporating residual blocks into its VAE architecture to improve its performance. The experimental results show that TP-VWGAN with and without residual blocks outperforms existing methods in generating realistic protein structures, but incorporating residual blocks enhances its ability to capture key structural features. Comparisons also demonstrate that the more accurately a model learns symmetry features in the generated distance matrices, the better it captures key structural features, as demonstrated through benchmarking against existing methods. This work moves us closer to more advanced deep generative models that can explore a broader range of protein structures and be applied to drug design and protein engineering. The code and data are available at https://github.com/aalaa-sehsah/tp-vwgan .
Collapse
Affiliation(s)
- Aalaa I Sehsah
- Department of Computer Science, Faculty of Computers and Information, Kafrelsheikh University, Kafr El Sheikh, 33516, Egypt.
| | - Afaf Mousa
- Department of Computer Science, Faculty of Computers and Information, Menoufia University, Shebin El Kom, 32511, Egypt
| | - Gamal Farouk
- Department of Computer Science, Faculty of Computers and Information, Menoufia University, Shebin El Kom, 32511, Egypt
| |
Collapse
|
2
|
Halma MTJ, Kumar S, van Eck J, Abeln S, Gates A, Wuite GJL. FAIR data for optical tweezers experiments. Biophys J 2025; 124:1255-1272. [PMID: 40083158 DOI: 10.1016/j.bpj.2025.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 01/11/2025] [Accepted: 03/07/2025] [Indexed: 03/16/2025] Open
Abstract
The single-molecule biophysics community has delivered significant impacts to our understanding of fundamental biological processes, yet the field is also siloed and has fragmented data structures, which impede data sharing and limit the ability to conduct comprehensive meta-analyses. To advance the field of optical tweezers in single-molecule biophysics, it is important that the field adopts open and collaborative data sharing that facilitate meta-analyses that combine diverse resources and supports more advanced analyses, akin to those seen in projects such as the Protein Data Bank and the 1000 Genomes Project. Here, we assess the state of data findability, accessibility, interoperability, and reusability (the FAIR principles) within the single-molecule optical tweezers field. By combining a qualitative review with quantitative tools from bibliometrics, our analysis suggests that the field has significant room for improvement in terms of FAIR adherence. Finally, we discuss the potential of compulsory data deposition and a minimal set of metadata standards to ensure reproducibility and interoperability between systems. While implementing these measures may not be straightforward, they are key steps that will enhance the integration of optical tweezers biophysics with the broader biomedical literature.
Collapse
Affiliation(s)
- Matthew T J Halma
- Department of Physics and Astronomy, Vrije Universiteit Amsterdam, Amsterdam, North Holland, the Netherlands; Lumicks B.V., Amsterdam, North Holland, the Netherlands
| | - Sowmiyaa Kumar
- Department of Computer Science, Vrije Universiteit, Amsterdam, North Holland, the Netherlands
| | - Jan van Eck
- Department of Computer Science, Vrije Universiteit, Amsterdam, North Holland, the Netherlands
| | - Sanne Abeln
- Department of Computer Science, Vrije Universiteit, Amsterdam, North Holland, the Netherlands
| | - Alexander Gates
- School of Data Science, University of Virginia, Charlottesville, Virginia.
| | - Gijs J L Wuite
- Department of Physics and Astronomy, Vrije Universiteit Amsterdam, Amsterdam, North Holland, the Netherlands; Lumicks B.V., Amsterdam, North Holland, the Netherlands.
| |
Collapse
|
3
|
Chung J, Hahn H, Flores-Espinoza E, Thomsen ARB. Artificial Intelligence: A New Tool for Structure-Based G Protein-Coupled Receptor Drug Discovery. Biomolecules 2025; 15:423. [PMID: 40149959 PMCID: PMC11940138 DOI: 10.3390/biom15030423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2025] [Revised: 03/10/2025] [Accepted: 03/11/2025] [Indexed: 03/29/2025] Open
Abstract
Understanding protein structures can facilitate the development of therapeutic drugs. Traditionally, protein structures have been determined through experimental approaches such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy. While these methods are effective and are considered the gold standard, they are very resource-intensive and time-consuming, ultimately limiting their scalability. However, with recent developments in computational biology and artificial intelligence (AI), the field of protein prediction has been revolutionized. Innovations like AlphaFold and RoseTTAFold enable protein structure predictions to be made directly from amino acid sequences with remarkable speed and accuracy. Despite the enormous enthusiasm associated with these newly developed AI-approaches, their true potential in structure-based drug discovery remains uncertain. In fact, although these algorithms generally predict overall protein structures well, essential details for computational ligand docking, such as the exact location of amino acid side chains within the binding pocket, are not predicted with the necessary accuracy. Additionally, docking methodologies are considered more as a hypothesis generator rather than a precise predictor of ligand-target interactions, and thus, usually identify many false-positive hits among only a few correctly predicted interactions. In this paper, we are reviewing the latest development in this cutting-edge field with emphasis on the GPCR target class to assess the potential role of AI approaches in structure-based drug discovery.
Collapse
Affiliation(s)
- Jason Chung
- Department of Molecular Pathobiology, New York University College of Dentistry, New York, NY 10010, USA; (J.C.); (H.H.); (E.F.-E.)
- NYU Pain Research Center, New York University College of Dentistry, New York, NY 10010, USA
| | - Hyunggu Hahn
- Department of Molecular Pathobiology, New York University College of Dentistry, New York, NY 10010, USA; (J.C.); (H.H.); (E.F.-E.)
- NYU Pain Research Center, New York University College of Dentistry, New York, NY 10010, USA
| | - Emmanuel Flores-Espinoza
- Department of Molecular Pathobiology, New York University College of Dentistry, New York, NY 10010, USA; (J.C.); (H.H.); (E.F.-E.)
- NYU Pain Research Center, New York University College of Dentistry, New York, NY 10010, USA
| | - Alex R. B. Thomsen
- Department of Molecular Pathobiology, New York University College of Dentistry, New York, NY 10010, USA; (J.C.); (H.H.); (E.F.-E.)
- NYU Pain Research Center, New York University College of Dentistry, New York, NY 10010, USA
| |
Collapse
|
4
|
Zhong J, Zou Z, Qiu J, Wang S. ScFold: a GNN-based model for efficient inverse folding of short-chain proteins via spatial reduction. Brief Bioinform 2025; 26:bbaf156. [PMID: 40205854 PMCID: PMC11982017 DOI: 10.1093/bib/bbaf156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 02/24/2025] [Accepted: 03/19/2025] [Indexed: 04/11/2025] Open
Abstract
In the realm of protein design, the efficient construction of protein sequences that accurately fold into predefined structures has become an important area of research. Although advancements have been made in the study of long-chain proteins, the design of short-chain proteins requires equal consideration. The structural information inherent in short and single chains is typically less comprehensive than that of full-length chains, which can negatively impact their performance. To address this challenge, we introduce ScFold, a novel model that incorporates an innovative node module. This module utilizes spatial dimensionality reduction and positional encoding mechanisms to enhance the extraction of structural features. Experimental results indicate that ScFold achieves a recovery rate of 52.22$\%$ on the CATH4.2 dataset, demonstrating notable efficacy for short-chain proteins, with a recovery rate of 41.6$\%$. Additionally, ScFold further exhibits enhanced recovery rates of 59.32$\%$ and 61.59$\%$ on the TS50 and TS500 datasets, respectively, demonstrating its effectiveness across diverse protein types. Additionally, we performed protein length stratification on the TS500 and CATH4.2 datasets and tested ScFold on length-specific sub-datasets. The results confirm the model's superiority in handling short-chain proteins. Finally, we selected several protein sequence groups from the CATH4.2 dataset for structural visualization analysis and provided comparisons between the model-generated sequences and the target sequences.
Collapse
Affiliation(s)
- Jiancheng Zhong
- College of Information Science and Engineering, Hunan Normal University, 36 Lushan Road, Yuelu District, Changsha 410081, Hunan, China
| | - Zhiwei Zou
- College of Information Science and Engineering, Hunan Normal University, 36 Lushan Road, Yuelu District, Changsha 410081, Hunan, China
| | - Jie Qiu
- College of Information Science and Engineering, Hunan Normal University, 36 Lushan Road, Yuelu District, Changsha 410081, Hunan, China
| | - Shaokai Wang
- Department of Mathematics, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China
| |
Collapse
|
5
|
Alshammry N. Developing a method for predicting DNA nucleosomal sequences using deep learning. Technol Health Care 2025; 33:989-999. [PMID: 40105177 DOI: 10.1177/09287329241297900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
BackgroundDeep learning excels at processing raw data because it automatically extracts and classifies high-level features. Despite biology's low popularity in data analysis, incorporating computer technology can improve biological research.ObjectiveTo create a deep learning model that can identify nucleosomes from nucleotide sequences and to show that simpler models outperform more complicated ones in solving biological challenges.MethodsA classifier was created utilising deep learning and machine learning approaches. The final model consists of two convolutional layers, one max pooling layer, two fully connected layers, and a dropout regularisation layer. This structure was chosen on the basis of the 'less is frequently more' approach, which emphasises simple design without large hidden layers.ResultsExperimental results show that deep learning methods, specifically deep neural networks, outperform typical machine learning algorithms for recognising nucleosomes. The simplified network architecture proved suitable without the requirement for numerous hidden neurons, resulting in effective network performance.ConclusionThis study demonstrates that machine learning and other computational techniques may streamline and expedite the resolution of biological issues. The model helps identify nucleosomes and can be used in future research or labs. This study discusses the challenges of understanding and addressing simple biological problems with sophisticated computer technology and offers practical solutions for academic and economic sectors.
Collapse
Affiliation(s)
- Nizal Alshammry
- Department of Computer Sciences, Faculty of Computing and Information Technology, Northern Border University, Rafha, Saudi Arabia
| |
Collapse
|
6
|
Nuthakki VK, Barik R, Gangashetty SB, Srikanth G. Advanced molecular modeling of proteins: Methods, breakthroughs, and future prospects. ADVANCES IN PHARMACOLOGY (SAN DIEGO, CALIF.) 2025; 103:23-41. [PMID: 40175043 DOI: 10.1016/bs.apha.2025.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2025]
Abstract
The contemporary advancements in molecular modeling of proteins have significantly enhanced our comprehension of biological processes and the functional roles of proteins on a global scale. The application of advanced methodologies, including homology modeling, molecular dynamics simulations, and quantum mechanics/molecular mechanics strategies, has empowered numerous researchers to forecast the behavior of protein macromolecules, elucidate drug-protein interactions, and develop drugs with enhanced precision. This chapter elucidates the advent of deep learning algorithms such as AlphaFold, a notable advancement that has significantly improved the precision of intricate protein structure predictions. The recent advancements have significantly enhanced the precision of protein predictions and expedited drug discovery and development processes. Integrating approaches like multi-scale modeling and hybrid methods incorporating reliable experimental data is anticipated to revolutionize and offer more significant implications for precision medicine and targeted treatments.
Collapse
Affiliation(s)
- Vijay Kumar Nuthakki
- Department of Pharmaceutical Chemistry, GITAM School of Pharmacy, GITAM Deemed to be University, Hyderabad, Telangana, India
| | - Rakesh Barik
- Department of Pharmacognosy and Phytochemistry, GITAM School of Pharmacy, GITAM Deemed to be University, Hyderabad, Telangana, India
| | | | - Gatadi Srikanth
- Department of Pharmaceutical Chemistry, GITAM School of Pharmacy, GITAM Deemed to be University, Hyderabad, Telangana, India.
| |
Collapse
|
7
|
Li Y, Duan Z, Li Z, Xue W. Data and AI-driven synthetic binding protein discovery. Trends Pharmacol Sci 2025; 46:132-144. [PMID: 39755458 DOI: 10.1016/j.tips.2024.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2024] [Revised: 12/02/2024] [Accepted: 12/06/2024] [Indexed: 01/06/2025]
Abstract
Synthetic binding proteins (SBPs) are a class of protein binders that are artificially created and do not exist naturally. Their broad applications in tackling challenges of research, diagnostics, and therapeutics have garnered significant interest. Traditional protein engineering is pivotal to the discovery of SBPs. Recently, this discovery has been significantly accelerated by computational approaches, such as molecular modeling and artificial intelligence (AI). Furthermore, while numerous bioinformatics databases offer a wealth of resources that fuel SBP discovery, the full potential of these data has not yet been fully exploited. In this review, we present a comprehensive overview of SBP data ecosystem and methodologies in SBP discovery, highlighting the critical role of high-quality data and AI technologies in accelerating the discovery of innovative SBPs with promising applications in pharmacological sciences.
Collapse
Affiliation(s)
- Yanlin Li
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Zixin Duan
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Zhenwen Li
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China; Western (Chongqing) Collaborative Innovation Center for Intelligent Diagnostics and Digital Medicine, Chongqing National Biomedicine Industry Park, Chongqing 401329, China.
| |
Collapse
|
8
|
Soleymani F, Paquet E, Viktor HL, Michalowski W. Structure-based protein and small molecule generation using EGNN and diffusion models: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2779-2797. [PMID: 39050782 PMCID: PMC11268121 DOI: 10.1016/j.csbj.2024.06.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 06/13/2024] [Accepted: 06/18/2024] [Indexed: 07/27/2024] Open
Abstract
Recent breakthroughs in deep learning have revolutionized protein sequence and structure prediction. These advancements are built on decades of protein design efforts, and are overcoming traditional time and cost limitations. Diffusion models, at the forefront of these innovations, significantly enhance design efficiency by automating knowledge acquisition. In the field of de novo protein design, the goal is to create entirely novel proteins with predetermined structures. Given the arbitrary positions of proteins in 3-D space, graph representations and their properties are widely used in protein generation studies. A critical requirement in protein modelling is maintaining spatial relationships under transformations (rotations, translations, and reflections). This property, known as equivariance, ensures that predicted protein characteristics adapt seamlessly to changes in orientation or position. Equivariant graph neural networks offer a solution to this challenge. By incorporating equivariant graph neural networks to learn the score of the probability density function in diffusion models, one can generate proteins with robust 3-D structural representations. This review examines the latest deep learning advancements, specifically focusing on frameworks that combine diffusion models with equivariant graph neural networks for protein generation.
Collapse
Affiliation(s)
- Farzan Soleymani
- Telfer School of Management, University of Ottawa, ON, K1N 6N5, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON, K1A 0R6, Canada
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | | |
Collapse
|
9
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
10
|
Garg P, Singhal G, Kulkarni P, Horne D, Salgia R, Singhal SS. Artificial Intelligence-Driven Computational Approaches in the Development of Anticancer Drugs. Cancers (Basel) 2024; 16:3884. [PMID: 39594838 PMCID: PMC11593155 DOI: 10.3390/cancers16223884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 11/13/2024] [Accepted: 11/16/2024] [Indexed: 11/28/2024] Open
Abstract
The integration of AI has revolutionized cancer drug development, transforming the landscape of drug discovery through sophisticated computational techniques. AI-powered models and algorithms have enhanced computer-aided drug design (CADD), offering unprecedented precision in identifying potential anticancer compounds. Traditionally, cancer drug design has been a complex, resource-intensive process, but AI introduces new opportunities to accelerate discovery, reduce costs, and optimize efficiency. This manuscript delves into the transformative applications of AI-driven methodologies in predicting and developing anticancer drugs, critically evaluating their potential to reshape the future of cancer therapeutics while addressing their challenges and limitations.
Collapse
Affiliation(s)
- Pankaj Garg
- Department of Chemistry, GLA University, Mathura 281406, Uttar Pradesh, India
| | - Gargi Singhal
- Department of Medical Sciences, S.N. Medical College, Agra 282002, Uttar Pradesh, India
| | - Prakash Kulkarni
- Department of Medical Oncology & Therapeutics Research, Beckman Research Institute of City of Hope, Comprehensive Cancer Center and National Medical Center, Duarte, CA 91010, USA
| | - David Horne
- Department of Molecular Medicine, Beckman Research Institute of City of Hope, Comprehensive Cancer Center and National Medical Center, Duarte, CA 91010, USA
| | - Ravi Salgia
- Department of Medical Oncology & Therapeutics Research, Beckman Research Institute of City of Hope, Comprehensive Cancer Center and National Medical Center, Duarte, CA 91010, USA
| | - Sharad S. Singhal
- Department of Medical Oncology & Therapeutics Research, Beckman Research Institute of City of Hope, Comprehensive Cancer Center and National Medical Center, Duarte, CA 91010, USA
| |
Collapse
|
11
|
Sherpa P, Chong KT, Tayara H. FvFold: A model to predict antibody Fv structure using protein language model with residual network and Rosetta minimization. Comput Biol Med 2024; 182:109128. [PMID: 39270460 DOI: 10.1016/j.compbiomed.2024.109128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 08/22/2024] [Accepted: 09/05/2024] [Indexed: 09/15/2024]
Abstract
The immune system depends on antibodies (Abs) to recognize and attach to a wide range of antigens, playing a pivotal role in immunity. The precise prediction of the variable fragment (Fv) region of antibodies is vital for the progress of therapeutic and commercial applications, particularly in the treatment of diseases such as cancer. Although deep learning models exist for accurate antibody structure prediction, challenges persist, particularly in modeling complementarity-determining regions (CDRs) and the overall antibody Fv structures. Introducing the FvFold model, a deep learning approach harnessing the capabilities of the ProtT5-XL-UniRef50 protein language model which is capable of predicting accurate antibody Fv structure. Through evaluations on various benchmarks, our model outperforms existing models, demonstrating superior accuracy by achieving lower Root Mean Square Deviation (RMSD) in almost all loops and Orientational Coordinate Distance (OCD) values in the RosettaAntibody benchmark, Therapeutic benchmark and IgFold benchmark compared to the previous top-performing model.
Collapse
Affiliation(s)
- Pasang Sherpa
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea; Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju, 54896, South Korea.
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, South Korea.
| |
Collapse
|
12
|
Perlinska AP, Sikora M, Sulkowska JI. Everything AlphaFold tells us about protein knots. J Mol Biol 2024; 436:168715. [PMID: 39029890 DOI: 10.1016/j.jmb.2024.168715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/29/2024] [Accepted: 07/14/2024] [Indexed: 07/21/2024]
Abstract
Recent advances in Machine Learning methods in structural biology opened up new perspectives for protein analysis. Utilizing these methods allows us to go beyond the limitations of empirical research, and take advantage of the vast amount of generated data. We use a complete set of potentially knotted protein models identified in all high-quality predictions from the AlphaFold Database to search for any common trends that describe them. We show that the vast majority of knotted proteins have 31 knot and that the presence of knots is preferred in neither Bacteria, Eukaryota, or Archaea domains. On the contrary, the percentage of knotted proteins in any given proteome is around 0.4%, regardless of the taxonomical group. We also verified that the organism's living conditions do not impact the number of knotted proteins in its proteome, as previously expected. We did not encounter an organism without a single knotted protein. What is more, we found four universally present families of knotted proteins in Bacteria, consisting of SAM synthase, and TrmD, TrmH, and RsmE methyltransferases.
Collapse
Affiliation(s)
- Agata P Perlinska
- Centre of New Technologies, University of Warsaw, Banacha 2c, Warsaw 02-097, Poland
| | - Maciej Sikora
- Centre of New Technologies, University of Warsaw, Banacha 2c, Warsaw 02-097, Poland
| | - Joanna I Sulkowska
- Centre of New Technologies, University of Warsaw, Banacha 2c, Warsaw 02-097, Poland.
| |
Collapse
|
13
|
Moraes Dos Santos L, Gutembergue de Mendonça J, Jerônimo Gomes Lobo Y, Henrique Franca de Lima L, Bruno Rocha G, C de Melo-Minardi R. Deep learning for discriminating non-trivial conformational changes in molecular dynamics simulations of SARS-CoV-2 spike-ACE2. Sci Rep 2024; 14:22639. [PMID: 39349594 PMCID: PMC11443059 DOI: 10.1038/s41598-024-72842-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 09/11/2024] [Indexed: 10/04/2024] Open
Abstract
Molecular dynamics (MD) simulations produce a substantial volume of high-dimensional data, and traditional methods for analyzing these data pose significant computational demands. Advances in MD simulation analysis combined with deep learning-based approaches have led to the understanding of specific structural changes observed in MD trajectories, including those induced by mutations. In this study, we model the trajectories resulting from MD simulations of the SARS-CoV-2 spike protein-ACE2, specifically the receptor-binding domain (RBD), as interresidue distance maps, and use deep convolutional neural networks to predict the functional impact of point mutations, related to the virus's infectivity and immunogenicity. Our model was successful in predicting mutant types that increase the affinity of the S protein for human receptors and reduce its immunogenicity, both based on MD trajectories (precision = 0.718; recall = 0.800; [Formula: see text] = 0.757; MCC = 0.488; AUC = 0.800) and their centroids. In an additional analysis, we also obtained a strong positive Pearson's correlation coefficient equal to 0.776, indicating a significant relationship between the average sigmoid probability for the MD trajectories and binding free energy (BFE) changes. Furthermore, we obtained a coefficient of determination of 0.602. Our 2D-RMSD analysis also corroborated predictions for more infectious and immune-evading mutants and revealed fluctuating regions within the receptor-binding motif (RBM), especially in the [Formula: see text] loop. This region presented a significant standard deviation for mutations that enable SARS-CoV-2 to evade the immune response, with RMSD values of 5Å in the simulation. This methodology offers an efficient alternative to identify potential strains of SARS-CoV-2, which may be potentially linked to more infectious and immune-evading mutations. Using clustering and deep learning techniques, our approach leverages information from the ensemble of MD trajectories to recognize a broad spectrum of multiple conformational patterns characteristic of mutant types. This represents a strategic advantage in identifying emerging variants, bypassing the need for long MD simulations. Furthermore, the present work tends to contribute substantially to the field of computational biology and virology, particularly to accelerate the design and optimization of new therapeutic agents and vaccines, offering a proactive stance against the constantly evolving threat of COVID-19 and potential future pandemics.
Collapse
Affiliation(s)
- Lucas Moraes Dos Santos
- Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
| | | | - Yan Jerônimo Gomes Lobo
- Department of Exact and Biological Sciences, Federal University of São João Del Rei, São João del Rei, Minas Gerais, Brazil
| | | | - Gerd Bruno Rocha
- Department of Chemistry, Federal University of Paraíba, João Pessoa, Paraíba, Brazil
| | - Raquel C de Melo-Minardi
- Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
| |
Collapse
|
14
|
Son A, Park J, Kim W, Yoon Y, Lee S, Park Y, Kim H. Revolutionizing Molecular Design for Innovative Therapeutic Applications through Artificial Intelligence. Molecules 2024; 29:4626. [PMID: 39407556 PMCID: PMC11477718 DOI: 10.3390/molecules29194626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 09/19/2024] [Accepted: 09/27/2024] [Indexed: 10/20/2024] Open
Abstract
The field of computational protein engineering has been transformed by recent advancements in machine learning, artificial intelligence, and molecular modeling, enabling the design of proteins with unprecedented precision and functionality. Computational methods now play a crucial role in enhancing the stability, activity, and specificity of proteins for diverse applications in biotechnology and medicine. Techniques such as deep learning, reinforcement learning, and transfer learning have dramatically improved protein structure prediction, optimization of binding affinities, and enzyme design. These innovations have streamlined the process of protein engineering by allowing the rapid generation of targeted libraries, reducing experimental sampling, and enabling the rational design of proteins with tailored properties. Furthermore, the integration of computational approaches with high-throughput experimental techniques has facilitated the development of multifunctional proteins and novel therapeutics. However, challenges remain in bridging the gap between computational predictions and experimental validation and in addressing ethical concerns related to AI-driven protein design. This review provides a comprehensive overview of the current state and future directions of computational methods in protein engineering, emphasizing their transformative potential in creating next-generation biologics and advancing synthetic biology.
Collapse
Affiliation(s)
- Ahrum Son
- Department of Molecular Medicine, Scripps Research, La Jolla, CA 92037, USA;
| | - Jongham Park
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (Y.Y.); (S.L.); (Y.P.)
| | - Woojin Kim
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (Y.Y.); (S.L.); (Y.P.)
| | - Yoonki Yoon
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (Y.Y.); (S.L.); (Y.P.)
| | - Sangwoon Lee
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (Y.Y.); (S.L.); (Y.P.)
| | - Yongho Park
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (Y.Y.); (S.L.); (Y.P.)
| | - Hyunsoo Kim
- Department of Bio-AI Convergence, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea; (J.P.); (W.K.); (Y.Y.); (S.L.); (Y.P.)
- Department of Convergent Bioscience and Informatics, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
- Protein AI Design Institute, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
- SCICS, Prove beyond AI, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea
| |
Collapse
|
15
|
Rahimzadeh F, Mohammad Khanli L, Salehpoor P, Golabi F, PourBahrami S. Unveiling the evolution of policies for enhancing protein structure predictions: A comprehensive analysis. Comput Biol Med 2024; 179:108815. [PMID: 38986287 DOI: 10.1016/j.compbiomed.2024.108815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/09/2024] [Accepted: 06/24/2024] [Indexed: 07/12/2024]
Abstract
Predicting protein structure is both fascinating and formidable, playing a crucial role in structure-based drug discovery and unraveling diseases with elusive origins. The Critical Assessment of Protein Structure Prediction (CASP) serves as a biannual battleground where global scientists converge to untangle the intricate relationships within amino acid chains. Two primary methods, Template-Based Modeling (TBM) and Template-Free (TF) strategies, dominate protein structure prediction. The trend has shifted towards Template-Free predictions due to their broader sequence coverage with fewer templates. The predictive process can be broadly classified into contact map, binned-distance, and real-valued distance predictions, each with distinctive strengths and limitations manifested through tailored loss functions. We have also introduced revolutionary end-to-end, and all-atom diffusion-based techniques that have transformed protein structure predictions. Recent advancements in deep learning techniques have significantly improved prediction accuracy, although the effectiveness is contingent upon the quality of input features derived from natural bio-physiochemical attributes and Multiple Sequence Alignments (MSA). Hence, the generation of high-quality MSA data holds paramount importance in harnessing informative input features for enhanced prediction outcomes. Remarkable successes have been achieved in protein structure prediction accuracy, however not enough for what structural knowledge was intended to, which implies need for development in some other aspects of the predictions. In this regard, scientists have opened other frontiers for protein structural prediction. The utilization of subsampling in multiple sequence alignment (MSA) and protein language modeling appears to be particularly promising in enhancing the accuracy and efficiency of predictions, ultimately aiding in drug discovery efforts. The exploration of predicting protein complex structure also opens up exciting opportunities to deepen our knowledge of molecular interactions and design therapeutics that are more effective. In this article, we have discussed the vicissitudes that the scientists have gone through to improve prediction accuracy, and examined the effective policies in predicting from different aspects, including the construction of high quality MSA, providing informative input features, and progresses in deep learning approaches. We have also briefly touched upon transitioning from predicting single-chain protein structures to predicting protein complex structures. Our findings point towards promoting open research environments to support the objectives of protein structure prediction.
Collapse
Affiliation(s)
- Faezeh Rahimzadeh
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | | | - Pedram Salehpoor
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Faegheh Golabi
- Department of Biomedical Engineering, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Shahin PourBahrami
- Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran
| |
Collapse
|
16
|
Williams CD, Kalayan J, Burton NA, Bryce RA. Stable and accurate atomistic simulations of flexible molecules using conformationally generalisable machine learned potentials. Chem Sci 2024; 15:12780-12795. [PMID: 39148799 PMCID: PMC11323334 DOI: 10.1039/d4sc01109k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 07/07/2024] [Indexed: 08/17/2024] Open
Abstract
Computational simulation methods based on machine learned potentials (MLPs) promise to revolutionise shape prediction of flexible molecules in solution, but their widespread adoption has been limited by the way in which training data is generated. Here, we present an approach which allows the key conformational degrees of freedom to be properly represented in reference molecular datasets. MLPs trained on these datasets using a global descriptor scheme are generalisable in conformational space, providing quantum chemical accuracy for all conformers. These MLPs are capable of propagating long, stable molecular dynamics trajectories, an attribute that has remained a challenge. We deploy the MLPs in obtaining converged conformational free energy surfaces for flexible molecules via well-tempered metadynamics simulations; this approach provides a hitherto inaccessible route to accurately computing the structural, dynamical and thermodynamical properties of a wide variety of flexible molecular systems. It is further demonstrated that MLPs must be trained on reference datasets with complete coverage of conformational space, including in barrier regions, to achieve stable molecular dynamics trajectories.
Collapse
Affiliation(s)
- Christopher D Williams
- Division of Pharmacy and Optometry, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester Oxford Road Manchester M13 9PL UK
| | - Jas Kalayan
- Science and Technologies Facilities Council (STFC), Daresbury Laboratory Keckwick Lane, Daresbury Warrington WA4 4AD UK
| | - Neil A Burton
- Department of Chemistry, School of Natural Sciences, Faculty of Science and Engineering, The University of Manchester Oxford Road Manchester M13 9PL UK
| | - Richard A Bryce
- Division of Pharmacy and Optometry, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester Oxford Road Manchester M13 9PL UK
| |
Collapse
|
17
|
Ghafarollahi A, Buehler MJ. ProtAgents: protein discovery via large language model multi-agent collaborations combining physics and machine learning. DIGITAL DISCOVERY 2024; 3:1389-1409. [PMID: 38993729 PMCID: PMC11235180 DOI: 10.1039/d4dd00013g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 05/13/2024] [Indexed: 07/13/2024]
Abstract
Designing de novo proteins beyond those found in nature holds significant promise for advancements in both scientific and engineering applications. Current methodologies for protein design often rely on AI-based models, such as surrogate models that address end-to-end problems by linking protein structure to material properties or vice versa. However, these models frequently focus on specific material objectives or structural properties, limiting their flexibility when incorporating out-of-domain knowledge into the design process or comprehensive data analysis is required. In this study, we introduce ProtAgents, a platform for de novo protein design based on Large Language Models (LLMs), where multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment. The versatility in agent development allows for expertise in diverse domains, including knowledge retrieval, protein structure analysis, physics-based simulations, and results analysis. The dynamic collaboration between agents, empowered by LLMs, provides a versatile approach to tackling protein design and analysis problems, as demonstrated through diverse examples in this study. The problems of interest encompass designing new proteins, analyzing protein structures and obtaining new first-principles data - natural vibrational frequencies - via physics simulations. The concerted effort of the system allows for powerful automated and synergistic design of de novo proteins with targeted mechanical properties. The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment on the other hand, unleashes great potentials of LLMs in addressing multi-objective materials problems and opens up new avenues for autonomous materials discovery and design.
Collapse
Affiliation(s)
- Alireza Ghafarollahi
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA
| | - Markus J Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA
- Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA
| |
Collapse
|
18
|
Wang J, Watson JL, Lisanza SL. Protein Design Using Structure-Prediction Networks: AlphaFold and RoseTTAFold as Protein Structure Foundation Models. Cold Spring Harb Perspect Biol 2024; 16:a041472. [PMID: 38438190 PMCID: PMC11216169 DOI: 10.1101/cshperspect.a041472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
Designing proteins with tailored structures and functions is a long-standing goal in bioengineering. Recently, deep learning advances have enabled protein structure prediction at near-experimental accuracy, which has catalyzed progress in protein design as well. We review recent studies that use structure-prediction neural networks to design proteins, via approaches such as activation maximization, inpainting, or denoising diffusion. These methods have led to major improvements over previous methods in wet-lab success rates for designing protein binders, metalloproteins, enzymes, and oligomeric assemblies. These results show that structure-prediction models are a powerful foundation for developing protein-design tools and suggest that continued improvement of their accuracy and generality will be key to unlocking the full potential of protein design.
Collapse
Affiliation(s)
- Jue Wang
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
- Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, Washington 98195, USA
- DeepMind, London EC4A 3BF, United Kingdom
| | - Joseph L Watson
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
- Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
| | - Sidney L Lisanza
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
- Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
19
|
Tang X, Dai H, Knight E, Wu F, Li Y, Li T, Gerstein M. A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation. Brief Bioinform 2024; 25:bbae338. [PMID: 39007594 PMCID: PMC11247410 DOI: 10.1093/bib/bbae338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 05/21/2024] [Accepted: 06/27/2024] [Indexed: 07/16/2024] Open
Abstract
Artificial intelligence (AI)-driven methods can vastly improve the historically costly drug design process, with various generative models already in widespread use. Generative models for de novo drug design, in particular, focus on the creation of novel biological compounds entirely from scratch, representing a promising future direction. Rapid development in the field, combined with the inherent complexity of the drug design process, creates a difficult landscape for new researchers to enter. In this survey, we organize de novo drug design into two overarching themes: small molecule and protein generation. Within each theme, we identify a variety of subtasks and applications, highlighting important datasets, benchmarks, and model architectures and comparing the performance of top models. We take a broad approach to AI-driven drug design, allowing for both micro-level comparisons of various methods within each subtask and macro-level observations across different fields. We discuss parallel challenges and approaches between the two applications and highlight future directions for AI-driven de novo drug design as a whole. An organized repository of all covered sources is available at https://github.com/gersteinlab/GenAI4Drug.
Collapse
Affiliation(s)
- Xiangru Tang
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Howard Dai
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Elizabeth Knight
- School of Medicine, Yale University, New Haven, CT 06520, United States
| | - Fang Wu
- Computer Science Department, Stanford University, CA 94305, United States
| | - Yunyang Li
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Tianxiao Li
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
| | - Mark Gerstein
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
- Department of Statistics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT 06520, United States
| |
Collapse
|
20
|
Zheng T, Zhang C. Engineering strategies and challenges of endolysin as an antibacterial agent against Gram-negative bacteria. Microb Biotechnol 2024; 17:e14465. [PMID: 38593316 PMCID: PMC11003714 DOI: 10.1111/1751-7915.14465] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/09/2024] [Accepted: 03/21/2024] [Indexed: 04/11/2024] Open
Abstract
Bacteriophage endolysin is a novel antibacterial agent that has attracted much attention in the prevention and control of drug-resistant bacteria due to its unique mechanism of hydrolysing peptidoglycans. Although endolysin exhibits excellent bactericidal effects on Gram-positive bacteria, the presence of the outer membrane of Gram-negative bacteria makes it difficult to lyse them extracellularly, thus limiting their application field. To enhance the extracellular activity of endolysin and facilitate its crossing through the outer membrane of Gram-negative bacteria, researchers have adopted physical, chemical, and molecular methods. This review summarizes the characterization of endolysin targeting Gram-negative bacteria, strategies for endolysin modification, and the challenges and future of engineering endolysin against Gram-negative bacteria in clinical applications, to promote the application of endolysin in the prevention and control of Gram-negative bacteria.
Collapse
Affiliation(s)
- Tianyu Zheng
- Bathurst Future Agri‐Tech InstituteQingdao Agricultural UniversityQingdaoChina
| | - Can Zhang
- College of Veterinary MedicineQingdao Agricultural UniversityQingdaoChina
| |
Collapse
|
21
|
Reveguk I, Simonson T. Classifying protein kinase conformations with machine learning. Protein Sci 2024; 33:e4918. [PMID: 38501429 PMCID: PMC10962494 DOI: 10.1002/pro.4918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 01/02/2024] [Accepted: 01/22/2024] [Indexed: 03/20/2024]
Abstract
Protein kinases are key actors of signaling networks and important drug targets. They cycle between active and inactive conformations, distinguished by a few elements within the catalytic domain. One is the activation loop, whose conserved DFG motif can occupy DFG-in, DFG-out, and some rarer conformations. Annotation and classification of the structural kinome are important, as different conformations can be targeted by different inhibitors and activators. Valuable resources exist; however, large-scale applications will benefit from increased automation and interpretability of structural annotation. Interpretable machine learning models are described for this purpose, based on ensembles of decision trees. To train them, a set of catalytic domain sequences and structures was collected, somewhat larger and more diverse than existing resources. The structures were clustered based on the DFG conformation and manually annotated. They were then used as training input. Two main models were constructed, which distinguished active/inactive and in/out/other DFG conformations. They considered initially 1692 structural variables, spanning the whole catalytic domain, then identified ("learned") a small subset that sufficed for accurate classification. The first model correctly labeled all but 3 of 3289 structures as active or inactive, while the second assigned the correct DFG label to all but 17 of 8826 structures. The most potent classifying variables were all related to well-known structural elements in or near the activation loop and their ranking gives insights into the conformational preferences. The models were used to automatically annotate 3850 kinase structures predicted recently with the Alphafold2 tool, showing that Alphafold2 reproduced the active/inactive but not the DFG-in proportions seen in the Protein Data Bank. We expect the models will be useful for understanding and engineering kinases.
Collapse
Affiliation(s)
- Ivan Reveguk
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654)Ecole PolytechniquePalaiseauFrance
| | - Thomas Simonson
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654)Ecole PolytechniquePalaiseauFrance
| |
Collapse
|
22
|
Wei J, Xiao J, Chen S, Zong L, Gao X, Li Y. ProNet DB: a proteome-wise database for protein surface property representations and RNA-binding profiles. Database (Oxford) 2024; 2024:baae012. [PMID: 38557634 PMCID: PMC10984565 DOI: 10.1093/database/baae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 01/08/2024] [Accepted: 02/17/2024] [Indexed: 04/04/2024]
Abstract
The rapid growth in the number of experimental and predicted protein structures and more complicated protein structures poses a significant challenge for computational biology in leveraging structural information and accurate representation of protein surface properties. Recently, AlphaFold2 released the comprehensive proteomes of various species, and protein surface property representation plays a crucial role in protein-molecule interaction predictions, including those involving proteins, nucleic acids and compounds. Here, we proposed the first extensive database, namely ProNet DB, that integrates multiple protein surface representations and RNA-binding landscape for 326 175 protein structures. This collection encompasses the 16 model organism proteomes from the AlphaFold Protein Structure Database and experimentally validated structures from the Protein Data Bank. For each protein, ProNet DB provides access to the original protein structures along with the detailed surface property representations encompassing hydrophobicity, charge distribution and hydrogen bonding potential as well as interactive features such as the interacting face and RNA-binding sites and preferences. To facilitate an intuitive interpretation of these properties and the RNA-binding landscape, ProNet DB incorporates visualization tools like Mol* and an Online 3D Viewer, allowing for the direct observation and analysis of these representations on protein surfaces. The availability of pre-computed features enables instantaneous access for users, significantly advancing computational biology research in areas such as molecular mechanism elucidation, geometry-based drug discovery and the development of novel therapeutic approaches. Database URL: https://proj.cse.cuhk.edu.hk/aihlab/pronet/.
Collapse
Affiliation(s)
- Junkang Wei
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Chung Chi Rd, Ma Liu Shui, Hong Kong SAR 999077, China
| | - Jin Xiao
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Chung Chi Rd, Ma Liu Shui, Hong Kong SAR 999077, China
| | - Siyuan Chen
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal 23955, Kingdom of Saudi Arabia
| | - Licheng Zong
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Chung Chi Rd, Ma Liu Shui, Hong Kong SAR 999077, China
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal 23955, Kingdom of Saudi Arabia
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Chung Chi Rd, Ma Liu Shui, Hong Kong SAR 999077, China
- The CUHK Shenzhen Research Institute, 4 Gaoxin Ave Nanshan, Shenzhen 518057, China
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 45 Carleton Street, Cambridge, MA 02142, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 201 Brookline Avenue, Boston, MA 02215, USA
- Broad Institute of MIT and Harvard, Merkin Building, 415 Main Street, Cambridge, MA 02142, USA
| |
Collapse
|
23
|
Maiti S, Singh A, Maji T, Saibo NV, De S. Experimental methods to study the structure and dynamics of intrinsically disordered regions in proteins. Curr Res Struct Biol 2024; 7:100138. [PMID: 38707546 PMCID: PMC11068507 DOI: 10.1016/j.crstbi.2024.100138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/12/2024] [Accepted: 03/15/2024] [Indexed: 05/07/2024] Open
Abstract
Eukaryotic proteins often feature long stretches of amino acids that lack a well-defined three-dimensional structure and are referred to as intrinsically disordered proteins (IDPs) or regions (IDRs). Although these proteins challenge conventional structure-function paradigms, they play vital roles in cellular processes. Recent progress in experimental techniques, such as NMR spectroscopy, single molecule FRET, high speed AFM and SAXS, have provided valuable insights into the biophysical basis of IDP function. This review discusses the advancements made in these techniques particularly for the study of disordered regions in proteins. In NMR spectroscopy new strategies such as 13C detection, non-uniform sampling, segmental isotope labeling, and rapid data acquisition methods address the challenges posed by spectral overcrowding and low stability of IDPs. The importance of various NMR parameters, including chemical shifts, hydrogen exchange rates, and relaxation measurements, to reveal transient secondary structures within IDRs and IDPs are presented. Given the high flexibility of IDPs, the review outlines NMR methods for assessing their dynamics at both fast (ps-ns) and slow (μs-ms) timescales. IDPs exert their functions through interactions with other molecules such as proteins, DNA, or RNA. NMR-based titration experiments yield insights into the thermodynamics and kinetics of these interactions. Detailed study of IDPs requires multiple experimental techniques, and thus, several methods are described for studying disordered proteins, highlighting their respective advantages and limitations. The potential for integrating these complementary techniques, each offering unique perspectives, is explored to achieve a comprehensive understanding of IDPs.
Collapse
Affiliation(s)
| | - Aakanksha Singh
- School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India
| | - Tanisha Maji
- School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India
| | - Nikita V. Saibo
- School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India
| | - Soumya De
- School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India
| |
Collapse
|
24
|
Corum MR, Venkannagari H, Hryc CF, Baker ML. Predictive modeling and cryo-EM: A synergistic approach to modeling macromolecular structure. Biophys J 2024; 123:435-450. [PMID: 38268190 PMCID: PMC10912932 DOI: 10.1016/j.bpj.2024.01.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 01/09/2024] [Accepted: 01/18/2024] [Indexed: 01/26/2024] Open
Abstract
Over the last 15 years, structural biology has seen unprecedented development and improvement in two areas: electron cryo-microscopy (cryo-EM) and predictive modeling. Once relegated to low resolutions, single-particle cryo-EM is now capable of achieving near-atomic resolutions of a wide variety of macromolecular complexes. Ushered in by AlphaFold, machine learning has powered the current generation of predictive modeling tools, which can accurately and reliably predict models for proteins and some complexes directly from the sequence alone. Although they offer new opportunities individually, there is an inherent synergy between these techniques, allowing for the construction of large, complex macromolecular models. Here, we give a brief overview of these approaches in addition to illustrating works that combine these techniques for model building. These examples provide insight into model building, assessment, and limitations when integrating predictive modeling with cryo-EM density maps. Together, these approaches offer the potential to greatly accelerate the generation of macromolecular structural insights, particularly when coupled with experimental data.
Collapse
Affiliation(s)
- Michael R Corum
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Harikanth Venkannagari
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Corey F Hryc
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Matthew L Baker
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas.
| |
Collapse
|
25
|
Pun MN, Ivanov A, Bellamy Q, Montague Z, LaMont C, Bradley P, Otwinowski J, Nourmohammad A. Learning the shape of protein microenvironments with a holographic convolutional neural network. Proc Natl Acad Sci U S A 2024; 121:e2300838121. [PMID: 38300863 PMCID: PMC10861886 DOI: 10.1073/pnas.2300838121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 11/29/2023] [Indexed: 02/03/2024] Open
Abstract
Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from its sequence or structure remains a major challenge. Here, we introduce holographic convolutional neural network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein stability and binding of protein complexes. Our interpretable computational model for protein structure-function maps could guide design of novel proteins with desired function.
Collapse
Affiliation(s)
- Michael N. Pun
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Andrew Ivanov
- Department of Physics, University of Washington, Seattle, WA98195
| | - Quinn Bellamy
- Department of Physics, University of Washington, Seattle, WA98195
| | - Zachary Montague
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Colin LaMont
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Philip Bradley
- Fred Hutchinson Cancer Center, Seattle, WA98102
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Jakub Otwinowski
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
- Dyno Therapeutics, Watertown, MA02472
| | - Armita Nourmohammad
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
- Fred Hutchinson Cancer Center, Seattle, WA98102
- Department of Applied Mathematics, University of Washington, Seattle, WA98105
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA98195
| |
Collapse
|
26
|
Xu B, Chen Y, Xue W. Computational Protein Design - Where it goes? Curr Med Chem 2024; 31:2841-2854. [PMID: 37272467 DOI: 10.2174/0929867330666230602143700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 02/18/2023] [Accepted: 03/15/2023] [Indexed: 06/06/2023]
Abstract
Proteins have been playing a critical role in the regulation of diverse biological processes related to human life. With the increasing demand, functional proteins are sparse in this immense sequence space. Therefore, protein design has become an important task in various fields, including medicine, food, energy, materials, etc. Directed evolution has recently led to significant achievements. Molecular modification of proteins through directed evolution technology has significantly advanced the fields of enzyme engineering, metabolic engineering, medicine, and beyond. However, it is impossible to identify desirable sequences from a large number of synthetic sequences alone. As a result, computational methods, including data-driven machine learning and physics-based molecular modeling, have been introduced to protein engineering to produce more functional proteins. This review focuses on recent advances in computational protein design, highlighting the applicability of different approaches as well as their limitations.
Collapse
Affiliation(s)
- Binbin Xu
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Yingjun Chen
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Weiwei Xue
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| |
Collapse
|
27
|
Biswas A, Kumari A, Gaikwad DS, Pandey DK. Revolutionizing Biological Science: The Synergy of Genomics in Health, Bioinformatics, Agriculture, and Artificial Intelligence. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2023; 27:550-569. [PMID: 38100404 DOI: 10.1089/omi.2023.0197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2023]
Abstract
With climate emergency, COVID-19, and the rise of planetary health scholarship, the binary of human and ecosystem health has been deeply challenged. The interdependence of human and nonhuman animal health is increasingly acknowledged and paving the way for new frontiers in integrative biology. The convergence of genomics in health, bioinformatics, agriculture, and artificial intelligence (AI) has ushered in a new era of possibilities and applications. However, the sheer volume of genomic/multiomics big data generated also presents formidable sociotechnical challenges in extracting meaningful biological, planetary health and ecological insights. Over the past few years, AI-guided bioinformatics has emerged as a powerful tool for managing, analyzing, and interpreting complex biological datasets. The advances in AI, particularly in machine learning and deep learning, have been transforming the fields of genomics, planetary health, and agriculture. This article aims to unpack and explore the formidable range of possibilities and challenges that result from such transdisciplinary integration, and emphasizes its radically transformative potential for human and ecosystem health. The integration of these disciplines is also driving significant advancements in precision medicine and personalized health care. This presents an unprecedented opportunity to deepen our understanding of complex biological systems and advance the well-being of all life in planetary ecosystems. Notwithstanding in mind its sociotechnical, ethical, and critical policy challenges, the integration of genomics, multiomics, planetary health, and agriculture with AI-guided bioinformatics opens up vast opportunities for transnational collaborative efforts, data sharing, analysis, valorization, and interdisciplinary innovations in life sciences and integrative biology.
Collapse
Affiliation(s)
- Aakanksha Biswas
- Amity Institute of Biotechnology, Amity University Jharkhand, Ranchi, India
| | - Aditi Kumari
- Amity Institute of Biotechnology, Amity University Jharkhand, Ranchi, India
| | - D S Gaikwad
- Amity Institute of Organic Agriculture, Amity University, Noida, India
| | - Dhananjay K Pandey
- Amity Institute of Biotechnology, Amity University Jharkhand, Ranchi, India
| |
Collapse
|
28
|
Ochoa R, Fox T. Assessing the fast prediction of peptide conformers and the impact of non-natural modifications. J Mol Graph Model 2023; 125:108608. [PMID: 37659134 DOI: 10.1016/j.jmgm.2023.108608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 08/17/2023] [Accepted: 08/18/2023] [Indexed: 09/04/2023]
Abstract
We present an assessment of different approaches to predict peptide structures using modeling tools. Several small molecule, protein, and peptide-focused methodologies were used for the fast prediction of conformers for peptides shorter than 30 amino acids. We assessed the effect of including restraints based on annotated or predicted secondary structure motifs. A number of peptides in bound conformations and in solution were collected to compare the tools. In addition, we studied the impact of changing single amino acids to non-natural residues using molecular dynamics simulations. Deep learning methods such as AlphaFold2, or the combination of physics-based approaches with secondary structure information, produce the most accurate results for natural sequences. In the case of peptides with non-natural modifications, modeling the peptide containing natural amino acids first and then modifying and simulating the peptide using benchmarked force fields is a recommended pipeline. The results can guide the modeling of oligopeptides for drug discovery projects.
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany.
| | - Thomas Fox
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| |
Collapse
|
29
|
López-Luis MA, Soriano-Pérez EE, Parada-Fabián JC, Torres J, Maldonado-Rodríguez R, Méndez-Tenorio A. A Proposal for a Consolidated Structural Model of the CagY Protein of Helicobacter pylori. Int J Mol Sci 2023; 24:16781. [PMID: 38069104 PMCID: PMC10706595 DOI: 10.3390/ijms242316781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 12/18/2023] Open
Abstract
CagY is the largest and most complex protein from Helicobacter pylori's (Hp) type IV secretion system (T4SS), playing a critical role in the modulation of gastric inflammation and risk for gastric cancer. CagY spans from the inner to the outer membrane, forming a channel through which Hp molecules are injected into human gastric cells. Yet, a tridimensional structure has been reported for only short segments of the protein. This intricate protein was modeled using different approaches, including homology modeling, ab initio, and deep learning techniques. The challengingly long middle repeat region (MRR) was modeled using deep learning and optimized using equilibrium molecular dynamics. The previously modeled segments were assembled into a 1595 aa chain and a 14-chain CagY multimer structure was assembled by structural alignment. The final structure correlated with published structures and allowed to show how the multimer may form the T4SS channel through which CagA and other molecules are translocated to gastric cells. The model confirmed that MRR, the most polymorphic and complex region of CagY, presents numerous cysteine residues forming disulfide bonds that stabilize the protein and suggest this domain may function as a contractile region playing an essential role in the modulating activity of CagY on tissue inflammation.
Collapse
Affiliation(s)
- Mario Angel López-Luis
- Laboratorio de Biotecnología y Bioinformática Genómica, Departamento de Bioquímica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Campus Lázaro Cárdenas, Mexico City 11340, Mexico; (M.A.L.-L.); (E.E.S.-P.); (J.C.P.-F.); (R.M.-R.)
| | - Eva Elda Soriano-Pérez
- Laboratorio de Biotecnología y Bioinformática Genómica, Departamento de Bioquímica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Campus Lázaro Cárdenas, Mexico City 11340, Mexico; (M.A.L.-L.); (E.E.S.-P.); (J.C.P.-F.); (R.M.-R.)
| | - José Carlos Parada-Fabián
- Laboratorio de Biotecnología y Bioinformática Genómica, Departamento de Bioquímica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Campus Lázaro Cárdenas, Mexico City 11340, Mexico; (M.A.L.-L.); (E.E.S.-P.); (J.C.P.-F.); (R.M.-R.)
| | - Javier Torres
- Unidad de Investigación en Enfermedades Infecciosas, UMAE Pediatría, Instituto Mexicano del Seguro Social, Mexico City 06720, Mexico;
| | - Rogelio Maldonado-Rodríguez
- Laboratorio de Biotecnología y Bioinformática Genómica, Departamento de Bioquímica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Campus Lázaro Cárdenas, Mexico City 11340, Mexico; (M.A.L.-L.); (E.E.S.-P.); (J.C.P.-F.); (R.M.-R.)
| | - Alfonso Méndez-Tenorio
- Laboratorio de Biotecnología y Bioinformática Genómica, Departamento de Bioquímica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Campus Lázaro Cárdenas, Mexico City 11340, Mexico; (M.A.L.-L.); (E.E.S.-P.); (J.C.P.-F.); (R.M.-R.)
| |
Collapse
|
30
|
Zhou X, Chen G, Ye J, Wang E, Zhang J, Mao C, Li Z, Hao J, Huang X, Tang J, Heng PA. ProRefiner: an entropy-based refining strategy for inverse protein folding with global graph attention. Nat Commun 2023; 14:7434. [PMID: 37973874 PMCID: PMC10654420 DOI: 10.1038/s41467-023-43166-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 11/02/2023] [Indexed: 11/19/2023] Open
Abstract
Inverse Protein Folding (IPF) is an important task of protein design, which aims to design sequences compatible with a given backbone structure. Despite the prosperous development of algorithms for this task, existing methods tend to rely on noisy predicted residues located in the local neighborhood when generating sequences. To address this limitation, we propose an entropy-based residue selection method to remove noise in the input residue context. Additionally, we introduce ProRefiner, a memory-efficient global graph attention model to fully utilize the denoised context. Our proposed method achieves state-of-the-art performance on multiple sequence design benchmarks in different design settings. Furthermore, we demonstrate the applicability of ProRefiner in redesigning Transposon-associated transposase B, where six out of the 20 variants we propose exhibit improved gene editing activity.
Collapse
Affiliation(s)
- Xinyi Zhou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Central Ave, Hong Kong, China
| | | | - Junjie Ye
- Noah's Ark Lab, Huawei, Shenzhen, China
| | - Ercheng Wang
- Zhejiang Lab, Kechuang Avenue, Hangzhou, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jun Zhang
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
| | - Cong Mao
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
| | - Zhanwei Li
- Zhejiang Lab, Kechuang Avenue, Hangzhou, China
| | | | | | - Jin Tang
- Zhejiang Lab, Kechuang Avenue, Hangzhou, China
| | - Pheng Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Central Ave, Hong Kong, China
- Zhejiang Lab, Kechuang Avenue, Hangzhou, China
| |
Collapse
|
31
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network. Proc Natl Acad Sci U S A 2023; 120:e2303590120. [PMID: 37729196 PMCID: PMC10523478 DOI: 10.1073/pnas.2303590120] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 08/14/2023] [Indexed: 09/22/2023] Open
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key posttranslational modification involved in physiology and disease. The ability to robustly and rapidly predict protease-substrate specificity would also enable targeted proteolytic cleavage by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pretrained PGCN model to guide the design of protease libraries for cleaving two noncanonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Joseph H. Lubin
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | | | - Guanyang Wang
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| |
Collapse
|
32
|
Pandey A, Liu E, Graham J, Chen W, Keten S. B-factor prediction in proteins using a sequence-based deep learning model. PATTERNS (NEW YORK, N.Y.) 2023; 4:100805. [PMID: 37720331 PMCID: PMC10499862 DOI: 10.1016/j.patter.2023.100805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/23/2023] [Accepted: 07/07/2023] [Indexed: 09/19/2023]
Abstract
B factors provide critical insight into protein dynamics. Predicting B factors of an atom in new proteins remains challenging as it is impacted by their neighbors in Euclidean space. Previous learning methods developed have resulted in low Pearson correlation coefficients beyond the training set due to their limited ability to capture the effect of neighboring atoms. With the advances in deep learning methods, we develop a sequence-based model that is tested on 2,442 proteins and outperforms the state-of-the-art models by 30%. We find that the model learns that the B factor of a site is prominently affected by atoms within a 12-15 Å radius, which is in excellent agreement with cutoffs from protein network models. The ablation study revealed that the B factor can largely be predicted from the primary sequence alone. Based on the abovementioned points, our model lays a foundation for predicting other properties that are correlated with the B factor.
Collapse
Affiliation(s)
- Akash Pandey
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Elaine Liu
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Jacob Graham
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Sinan Keten
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
- Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL, USA
| |
Collapse
|
33
|
Bauer J, Rajagopal N, Gupta P, Gupta P, Nixon AE, Kumar S. How can we discover developable antibody-based biotherapeutics? Front Mol Biosci 2023; 10:1221626. [PMID: 37609373 PMCID: PMC10441133 DOI: 10.3389/fmolb.2023.1221626] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 07/10/2023] [Indexed: 08/24/2023] Open
Abstract
Antibody-based biotherapeutics have emerged as a successful class of pharmaceuticals despite significant challenges and risks to their discovery and development. This review discusses the most frequently encountered hurdles in the research and development (R&D) of antibody-based biotherapeutics and proposes a conceptual framework called biopharmaceutical informatics. Our vision advocates for the syncretic use of computation and experimentation at every stage of biologic drug discovery, considering developability (manufacturability, safety, efficacy, and pharmacology) of potential drug candidates from the earliest stages of the drug discovery phase. The computational advances in recent years allow for more precise formulation of disease concepts, rapid identification, and validation of targets suitable for therapeutic intervention and discovery of potential biotherapeutics that can agonize or antagonize them. Furthermore, computational methods for de novo and epitope-specific antibody design are increasingly being developed, opening novel computationally driven opportunities for biologic drug discovery. Here, we review the opportunities and limitations of emerging computational approaches for optimizing antigens to generate robust immune responses, in silico generation of antibody sequences, discovery of potential antibody binders through virtual screening, assessment of hits, identification of lead drug candidates and their affinity maturation, and optimization for developability. The adoption of biopharmaceutical informatics across all aspects of drug discovery and development cycles should help bring affordable and effective biotherapeutics to patients more quickly.
Collapse
Affiliation(s)
- Joschka Bauer
- Early Stage Pharmaceutical Development Biologicals, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach/Riss, Germany
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
| | - Nandhini Rajagopal
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Priyanka Gupta
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Pankaj Gupta
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Andrew E. Nixon
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Sandeep Kumar
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| |
Collapse
|
34
|
Casadevall G, Duran C, Osuna S. AlphaFold2 and Deep Learning for Elucidating Enzyme Conformational Flexibility and Its Application for Design. JACS AU 2023; 3:1554-1562. [PMID: 37388680 PMCID: PMC10302747 DOI: 10.1021/jacsau.3c00188] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 05/22/2023] [Accepted: 05/22/2023] [Indexed: 07/01/2023]
Abstract
The recent success of AlphaFold2 (AF2) and other deep learning (DL) tools in accurately predicting the folded three-dimensional (3D) structure of proteins and enzymes has revolutionized the structural biology and protein design fields. The 3D structure indeed reveals key information on the arrangement of the catalytic machinery of enzymes and which structural elements gate the active site pocket. However, comprehending enzymatic activity requires a detailed knowledge of the chemical steps involved along the catalytic cycle and the exploration of the multiple thermally accessible conformations that enzymes adopt when in solution. In this Perspective, some of the recent studies showing the potential of AF2 in elucidating the conformational landscape of enzymes are provided. Selected examples of the key developments of AF2-based and DL methods for protein design are discussed, as well as a few enzyme design cases. These studies show the potential of AF2 and DL for allowing the routine computational design of efficient enzymes.
Collapse
Affiliation(s)
- Guillem Casadevall
- Institut
de Química Computacional i Catàlisi (IQCC) and Departament
de Química, Universitat de Girona, Maria Aurèlia Capmany 69, 17003 Girona, Spain
| | - Cristina Duran
- Institut
de Química Computacional i Catàlisi (IQCC) and Departament
de Química, Universitat de Girona, Maria Aurèlia Capmany 69, 17003 Girona, Spain
| | - Sílvia Osuna
- Institut
de Química Computacional i Catàlisi (IQCC) and Departament
de Química, Universitat de Girona, Maria Aurèlia Capmany 69, 17003 Girona, Spain
- ICREA, Passeig Lluís Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
35
|
Johnston KE, Fannjiang C, Wittmann BJ, Hie BL, Yang KK, Wu Z. Machine Learning for Protein Engineering. ARXIV 2023:arXiv:2305.16634v1. [PMID: 37292483 PMCID: PMC10246115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Directed evolution of proteins has been the most effective method for protein engineering. However, a new paradigm is emerging, fusing the library generation and screening approaches of traditional directed evolution with computation through the training of machine learning models on protein sequence fitness data. This chapter highlights successful applications of machine learning to protein engineering and directed evolution, organized by the improvements that have been made with respect to each step of the directed evolution cycle. Additionally, we provide an outlook for the future based on the current direction of the field, namely in the development of calibrated models and in incorporating other modalities, such as protein structure.
Collapse
Affiliation(s)
| | | | - Bruce J Wittmann
- work done while at California Institute of Technology, now at Microsoft
| | | | | | | |
Collapse
|
36
|
Ayub S, Malak N, Cossío-Bayúgar R, Nasreen N, Khan A, Niaz S, Khan A, Alanazi AD, Ben Said M. In Vitro and In Silico Protocols for the Assessment of Anti-Tick Compounds from Pinus roxburghii against Rhipicephalus (Boophilus) microplus Ticks. Animals (Basel) 2023; 13:ani13081388. [PMID: 37106951 PMCID: PMC10135231 DOI: 10.3390/ani13081388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 04/10/2023] [Accepted: 04/12/2023] [Indexed: 04/29/2023] Open
Abstract
Pinus roxburghii, also known by the name "Himalayan chir pine," belongs to the Pinaceae family. Rhipicephalus (Boophilus) microplus tick is one of the most significant bovine ectoparasites, making it a major vector of economically important tick-borne diseases. The researchers conducted adult immersion tests (AIT) and larval packet tests (LPT) to investigate the acaricidal effect of P. roxburghii plant extract on R. (B.) microplus and its potential modulatory function when used with cypermethrin. Eggs were also assessed for their weight, egg-laying index (IE), hatchability rate, and control rate. After exposure to essential extract concentrations ranging from 2.5 to 40 mg/mL for 48 h, adult female ticks' oviposition inhibition and unfed R. (B.) microplus larvae's mortality rates were analyzed. Engorged females exposed to P. roxburghii at 40 mg/mL had reduced biological activity (oviposition, IE) compared to positive and negative controls. A concentration of 40 mg/mL of P. roxburghii caused 90% mortality in R. (B.) microplus larvae, whereas cypermethrin (the positive control) caused 98.3% mortality in LPT. In AIT, cypermethrin inhibited 81% of oviposition, compared to the 40 mg/mL concentration of P. roxburghii, which inhibited 40% of the ticks' oviposition. Moreover, this study assessed the binding capacity of selected phytocompounds with the targeted protein. Three servers (SWISS-MODEL, RoseTTAFold, and TrRosetta) recreated the target protein RmGABACl's 3D structure. The modeled 3D structure was validated using the online servers PROCHECK, ERRAT, and Prosa. Molecular docking using Auto Dock VINA predicted the binding mechanisms of 20 drug-like compounds against the target protein. Catechin and myricetin showed significant interactions with active site residues of the target protein, with docking scores of -7.7 kcal/mol and -7.6 kcal/mol, respectively. In conclusion, this study demonstrated the acaricidal activity of P. roxburghii extract, suggesting its potential as an alternative natural acaricide for controlling R. (B.) microplus.
Collapse
Affiliation(s)
- Sana Ayub
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan
| | - Nosheen Malak
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan
| | - Raquel Cossío-Bayúgar
- Centro Nacional de Investigaciones Disciplinarias en Salud Animal e Inocuidad, Departamento de Artropodología, Instituto Nacional de Investigaciones Forestales Agrícolas y Pecuarias (INIFAP), Boulevard Cuauhnahuac No. 8534, Jiutepec 62574, Mexico
| | - Nasreen Nasreen
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan
| | - Afshan Khan
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan
| | - Sadaf Niaz
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan
| | - Adil Khan
- Department of Zoology, Bacha Khan University Charsadda, Charsadda 24420, Pakistan
| | - Abdallah D Alanazi
- Department of Biological Sciences, Faculty of Science and Humanities, Shaqra University, Ad-Dawadimi 11911, Saudi Arabia
| | - Mourad Ben Said
- Department of Basic Sciences, Higher Institute of Biotechnology of Sidi Thabet, University of Manouba, Manouba 2010, Tunisia
- Laboratory of Microbiology, National School of Veterinary Medicine, Sidi Thabet, University of Manouba, Manouba 2010, Tunisia
| |
Collapse
|
37
|
Chen Y, Wang Z, Wang L, Wang J, Li P, Cao D, Zeng X, Ye X, Sakurai T. Deep generative model for drug design from protein target sequence. J Cheminform 2023; 15:38. [PMID: 36978179 PMCID: PMC10052801 DOI: 10.1186/s13321-023-00702-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 02/18/2023] [Indexed: 03/30/2023] Open
Abstract
Drug discovery for a protein target is a laborious and costly process. Deep learning (DL) methods have been applied to drug discovery and successfully generated novel molecular structures, and they can substantially reduce development time and costs. However, most of them rely on prior knowledge, either by drawing on the structure and properties of known molecules to generate similar candidate molecules or extracting information on the binding sites of protein pockets to obtain molecules that can bind to them. In this paper, DeepTarget, an end-to-end DL model, was proposed to generate novel molecules solely relying on the amino acid sequence of the target protein to reduce the heavy reliance on prior knowledge. DeepTarget includes three modules: Amino Acid Sequence Embedding (AASE), Structural Feature Inference (SFI), and Molecule Generation (MG). AASE generates embeddings from the amino acid sequence of the target protein. SFI inferences the potential structural features of the synthesized molecule, and MG seeks to construct the eventual molecule. The validity of the generated molecules was demonstrated by a benchmark platform of molecular generation models. The interaction between the generated molecules and the target proteins was also verified on the basis of two metrics, drug-target affinity and molecular docking. The results of the experiments indicated the efficacy of the model for direct molecule generation solely conditioned on amino acid sequence.
Collapse
Affiliation(s)
- Yangyang Chen
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Lei Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, China
| | - Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon, 21983, Republic of Korea
- Bioinformatics and Molecular Design Research Center (BMDRC), Incheon, 21983, Republic of Korea
| | - Pengyong Li
- School of Computer Science and Technology, Xidian University, Xian, 710071, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, China.
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, People's Republic of China.
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| |
Collapse
|
38
|
Bougueroua S, Bricage M, Aboulfath Y, Barth D, Gaigeot MP. Algorithmic Graph Theory, Reinforcement Learning and Game Theory in MD Simulations: From 3D Structures to Topological 2D-Molecular Graphs (2D-MolGraphs) and Vice Versa. Molecules 2023; 28:molecules28072892. [PMID: 37049654 PMCID: PMC10096312 DOI: 10.3390/molecules28072892] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 03/17/2023] [Accepted: 03/18/2023] [Indexed: 04/14/2023] Open
Abstract
This paper reviews graph-theory-based methods that were recently developed in our group for post-processing molecular dynamics trajectories. We show that the use of algorithmic graph theory not only provides a direct and fast methodology to identify conformers sampled over time but also allows to follow the interconversions between the conformers through graphs of transitions in time. Examples of gas phase molecules and inhomogeneous aqueous solid interfaces are presented to demonstrate the power of topological 2D graphs and their versatility for post-processing molecular dynamics trajectories. An even more complex challenge is to predict 3D structures from topological 2D graphs. Our first attempts to tackle such a challenge are presented with the development of game theory and reinforcement learning methods for predicting the 3D structure of a gas-phase peptide.
Collapse
Affiliation(s)
- Sana Bougueroua
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| | - Marie Bricage
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Ylène Aboulfath
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Dominique Barth
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Marie-Pierre Gaigeot
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| |
Collapse
|
39
|
Zhang H, Li X, Li Z, Huang D, Zhang L. Estimation of Particle Location in Granular Materials Based on Graph Neural Networks. MICROMACHINES 2023; 14:714. [PMID: 37420946 DOI: 10.3390/mi14040714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/20/2023] [Accepted: 03/21/2023] [Indexed: 07/09/2023]
Abstract
Particle locations determine the whole structure of a granular system, which is crucial to understanding various anomalous behaviors in glasses and amorphous solids. How to accurately determine the coordinates of each particle in such materials within a short time has always been a challenge. In this paper, we use an improved graph convolutional neural network to estimate the particle locations in two-dimensional photoelastic granular materials purely from the knowledge of the distances for each particle, which can be estimated in advance via a distance estimation algorithm. The robustness and effectiveness of our model are verified by testing other granular systems with different disorder degrees, as well as systems with different configurations. In this study, we attempt to provide a new route to the structural information of granular systems irrelevant to dimensionality, compositions, or other material properties.
Collapse
Affiliation(s)
- Hang Zhang
- School of Automation, Central South University, Changsha 410083, China
| | - Xingqiao Li
- School of Automation, Central South University, Changsha 410083, China
| | - Zirui Li
- School of Automation, Central South University, Changsha 410083, China
| | - Duan Huang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ling Zhang
- School of Automation, Central South University, Changsha 410083, China
| |
Collapse
|
40
|
Sicard J, Barbe S, Boutrou R, Bouvier L, Delaplace G, Lashermes G, Théron L, Vitrac O, Tonda A. A primer on predictive techniques for food and bioresources transformation processes. J FOOD PROCESS ENG 2023. [DOI: 10.1111/jfpe.14325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]
Affiliation(s)
| | | | | | - Laurent Bouvier
- UMET Université de Lille, CNRS, Centrale Lille, INRAE Villeneuve‐D'Ascq France
| | - Guillaume Delaplace
- UMET Université de Lille, CNRS, Centrale Lille, INRAE Villeneuve‐D'Ascq France
| | | | | | - Olivier Vitrac
- SayFood, INRAE, AgroParisTech Université Paris Saclay Massy France
| | - Alberto Tonda
- MIA‐Paris, AgroParisTech, INRAE Université Paris Saclay Paris France
| |
Collapse
|
41
|
Yang Z, Zeng X, Zhao Y, Chen R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct Target Ther 2023; 8:115. [PMID: 36918529 PMCID: PMC10011802 DOI: 10.1038/s41392-023-01381-z] [Citation(s) in RCA: 168] [Impact Index Per Article: 84.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/27/2022] [Accepted: 02/16/2023] [Indexed: 03/16/2023] Open
Abstract
AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction is one of the most challenging problems in computational biology and chemistry, and has puzzled scientists for 50 years. The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention. Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community, especially in the fields of biology and medicine. AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information, such as drug discovery, protein design, prediction of protein function, et al. Though the time is not long since AF2 was developed, there are already quite a few application studies of AF2 in the fields of biology and medicine, with many of them having preliminarily proved the potential of AF2. To better understand AF2 and promote its applications, we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success, and particularly focus on reviewing its applications in the fields of biology and medicine. Limitations of current AF2 prediction will also be discussed.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Yi Zhao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China.
| |
Collapse
|
42
|
Wang F, Sangfuang N, McCoubrey LE, Yadav V, Elbadawi M, Orlu M, Gaisford S, Basit AW. Advancing oral delivery of biologics: Machine learning predicts peptide stability in the gastrointestinal tract. Int J Pharm 2023; 634:122643. [PMID: 36709014 DOI: 10.1016/j.ijpharm.2023.122643] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/18/2023] [Accepted: 01/20/2023] [Indexed: 01/26/2023]
Abstract
The oral delivery of peptide therapeutics could facilitate precision treatment of numerous gastrointestinal (GI) and systemic diseases with simple administration for patients. However, the vast majority of licensed peptide drugs are currently administered parenterally due to prohibitive peptide instability in the GI tract. As such, the development of GI-stable peptides is receiving considerable investment. This study provides researchers with the first tool to predict the GI stability of peptide therapeutics based solely on the amino acid sequence. Both unsupervised and supervised machine learning techniques were trained on literature-extracted data describing peptide stability in simulated gastric and small intestinal fluid (SGF and SIF). Based on 109 peptide incubations, classification models for SGF and SIF were developed. The best models utilized k-Nearest Neighbor (for SGF) and XGBoost (for SIF) algorithms, with accuracies of 75.1% (SGF) and 69.3% (SIF), and f1 scores of 84.5% (SGF) and 73.4% (SIF) under 5-fold cross-validation. Feature importance analysis demonstrated that peptides' lipophilicity, rigidity, and size were key determinants of stability. These models are now available to those working on the development of oral peptide therapeutics.
Collapse
Affiliation(s)
- Fanjin Wang
- Intract Pharma Ltd. London Bioscience Innovation Centre, 2 Royal College St, London NW1 0NH, UK
| | | | | | - Vipul Yadav
- Intract Pharma Ltd. London Bioscience Innovation Centre, 2 Royal College St, London NW1 0NH, UK
| | - Moe Elbadawi
- UCL School of Pharmacy, 29-39 Brunswick Square, London WC1N 1AX, UK
| | - Mine Orlu
- UCL School of Pharmacy, 29-39 Brunswick Square, London WC1N 1AX, UK
| | - Simon Gaisford
- UCL School of Pharmacy, 29-39 Brunswick Square, London WC1N 1AX, UK
| | - Abdul W Basit
- UCL School of Pharmacy, 29-39 Brunswick Square, London WC1N 1AX, UK.
| |
Collapse
|
43
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and Design of Protease Enzyme Specificity Using a Structure-Aware Graph Convolutional Network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.16.528728. [PMID: 36824945 PMCID: PMC9949123 DOI: 10.1101/2023.02.16.528728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key post-translational modification involved in physiology and disease. The ability to robustly and rapidly predict protease substrate specificity would also enable targeted proteolytic cleavage - editing - of a target protein by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally-derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the three-dimensional structure and energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically-grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases: the NS3/4 protease from the Hepatitis C virus (HCV) and the Tobacco Etch Virus (TEV) proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pre-trained PGCN model to guide the design of TEV protease libraries for cleaving two non-canonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Joseph H. Lubin
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | | | - Guanyang Wang
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| |
Collapse
|
44
|
Li AJ, Lu M, Desta I, Sundar V, Grigoryan G, Keating AE. Neural network-derived Potts models for structure-based protein design using backbone atomic coordinates and tertiary motifs. Protein Sci 2023; 32:e4554. [PMID: 36564857 PMCID: PMC9854172 DOI: 10.1002/pro.4554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/15/2022] [Accepted: 12/20/2022] [Indexed: 12/25/2022]
Abstract
Designing novel proteins to perform desired functions, such as binding or catalysis, is a major goal in synthetic biology. A variety of computational approaches can aid in this task. An energy-based framework rooted in the sequence-structure statistics of tertiary motifs (TERMs) can be used for sequence design on predefined backbones. Neural network models that use backbone coordinate-derived features provide another way to design new proteins. In this work, we combine the two methods to make neural structure-based models more suitable for protein design. Specifically, we supplement backbone-coordinate features with TERM-derived data, as inputs, and we generate energy functions as outputs. We present two architectures that generate Potts models over the sequence space: TERMinator, which uses both TERM-based and coordinate-based information, and COORDinator, which uses only coordinate-based information. Using these two models, we demonstrate that TERMs can be utilized to improve native sequence recovery performance of neural models. Furthermore, we demonstrate that sequences designed by TERMinator are predicted to fold to their target structures by AlphaFold. Finally, we show that both TERMinator and COORDinator learn notions of energetics, and these methods can be fine-tuned on experimental data to improve predictions. Our results suggest that using TERM-based and coordinate-based features together may be beneficial for protein design and that structure-based neural models that produce Potts energy tables have utility for flexible applications in protein science.
Collapse
Affiliation(s)
- Alex J. Li
- Department of ChemistryMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Mindren Lu
- Department of Electrical Engineering and Computer ScienceMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
- Department of Biological EngineeringMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Israel Desta
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Vikram Sundar
- Computational and Systems Biology ProgramMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Gevorg Grigoryan
- Department of Computer ScienceDartmouth CollegeHanoverNew HampshireUSA
| | - Amy E. Keating
- Department of Biological EngineeringMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
- Koch Institute for Integrative Cancer ResearchMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| |
Collapse
|
45
|
Soleymani F, Paquet E, Viktor HL, Michalowski W, Spinello D. ProtInteract: A deep learning framework for predicting protein-protein interactions. Comput Struct Biotechnol J 2023; 21:1324-1348. [PMID: 36817951 PMCID: PMC9929211 DOI: 10.1016/j.csbj.2023.01.028] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/20/2023] [Accepted: 01/20/2023] [Indexed: 01/26/2023] Open
Abstract
Proteins mainly perform their functions by interacting with other proteins. Protein-protein interactions underpin various biological activities such as metabolic cycles, signal transduction, and immune response. However, due to the sheer number of proteins, experimental methods for finding interacting and non-interacting protein pairs are time-consuming and costly. We therefore developed the ProtInteract framework to predict protein-protein interaction. ProtInteract comprises two components: first, a novel autoencoder architecture that encodes each protein's primary structure to a lower-dimensional vector while preserving its underlying sequence attributes. This leads to faster training of the second network, a deep convolutional neural network (CNN) that receives encoded proteins and predicts their interaction under three different scenarios. In each scenario, the deep CNN predicts the class of a given encoded protein pair. Each class indicates different ranges of confidence scores corresponding to the probability of whether a predicted interaction occurs or not. The proposed framework features significantly low computational complexity and relatively fast response. The contributions of this work are twofold. First, ProtInteract assimilates the protein's primary structure into a pseudo-time series. Therefore, we leverage the nature of the time series of proteins and their physicochemical properties to encode a protein's amino acid sequence into a lower-dimensional vector space. This approach enables extracting highly informative sequence attributes while reducing computational complexity. Second, the ProtInteract framework utilises this information to identify protein interactions with other proteins based on its amino acid configuration. Our results suggest that the proposed framework performs with high accuracy and efficiency in predicting protein-protein interactions.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada,Corresponding author.
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON K1N 6N5, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| |
Collapse
|
46
|
Sharifi F, Sharifi I, Babaei Z, Alahdin S, Afgar A. Bioinformatics evaluation of anticancer properties of GP63 protein-derived peptides on MMP2 protein of melanoma cancer. J Pathol Inform 2023; 14:100190. [PMID: 36700237 PMCID: PMC9867975 DOI: 10.1016/j.jpi.2023.100190] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 01/09/2023] [Accepted: 01/09/2023] [Indexed: 01/13/2023] Open
Abstract
Background GP63, also known as Leishmanolysin, is a multifunctional virulence factor abundant on the surface of Leishmania spp. small peptides with anticancer capabilities that are selective and toxic to cancer cells are known as anticancer peptides. We aimed to demonstrate the activity of GP63 and its anticancer properties on melanoma using a range of in silico tools and screening methods to identify predicted and designed anticancer peptides. Methods Various in silico modeling methodologies are used to establish the three-dimensional (3D) structure of GP63. Refinement and re-evaluation of the modeled structures and the built models' quality evaluated using the different docking used to find the interacting amino acids between MMP2 and GP63 and its anticancer peptides. AntiCP2.0 is used for screening anticancer peptides. 2D interaction plots of protein-ligand complexes evaluated by Protein-Ligand Interaction Profiler server. It is for the first time that used anticancer peptides of GP63 and the predicted and designed peptides. Results We used 3 peptides of GP63 based on the AntiCP 2.0 server with scores of 0.63, 0.53, and 0.49, and common peptides of GP63/MMP2 (continues peptide: mean the completely selected peptide after docking with non-anticancer effect, predicted with 0.58 score and designed peptides with 0.47 and 0.45 scores by AntiCP 2.0 server). Conclusions The antileishmanial and anticancer peptide research topics exemplify the multidisciplinary nature of peptide research. The advancement of therapeutics targeting cancer and/or Leishmania requires an interconnected research strategy shown in this work.
Collapse
Key Words
- ACPs, anticancer peptides
- Anticancer
- CASTp, Computed Atlas of Surface Topography of proteins
- CL, cutaneous leishmaniasis
- GP63, Glycoprotein 63
- In silico
- Leishmania
- Leishmanolysin
- MD, molecular dynamics
- MMPs, matrix metalloproteases
- MSP, major surface protease
- Matrix metalloproteases
- PDB, Protein Data Bank
- PLIP, Protein–Ligand Interaction Profiler
- Peptide
- Protein–Ligand Interaction Profiler
- ROS, reactive oxygen species formation
- SVM, Support Vector Machine
- VL, visceral leishmaniasis
- kNN, k-Nearest Neighbors
Collapse
Affiliation(s)
- Fatemeh Sharifi
- Research Center of Tropical and Infectious Diseases, Kerman University of Medical Sciences, Kerman, Iran
| | - Iraj Sharifi
- Leishmaniasis Research Center, Kerman University of Medical Sciences, Kerman, Iran
| | - Zahra Babaei
- Leishmaniasis Research Center, Kerman University of Medical Sciences, Kerman, Iran
| | - Sodabeh Alahdin
- Leishmaniasis Research Center, Kerman University of Medical Sciences, Kerman, Iran
- Student Research Committee, Kerman University of Medical Sciences, Kerman, Iran
| | - Ali Afgar
- Research Center for Hydatid Disease in Iran, Kerman University of Medical Sciences, Kerman, Iran
| |
Collapse
|
47
|
Nallasamy V, Seshiah M. Energy Profile Bayes and Thompson Optimized Convolutional Neural Network protein structure prediction. Neural Comput Appl 2023; 35:1983-2006. [PMID: 36245797 PMCID: PMC9542649 DOI: 10.1007/s00521-022-07868-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/21/2022] [Indexed: 01/12/2023]
Abstract
In living organisms, proteins are considered as the executants of biological functions. Owing to its pivotal role played in protein folding patterns, comprehension of protein structure is a challenging issue. Moreover, owing to numerous protein sequence exploration in protein data banks and complication of protein structures, experimental methods are found to be inadequate for protein structural class prediction. Hence, it is very much advantageous to design a reliable computational method to predict protein structural classes from protein sequences. In the recent few years there has been an elevated interest in using deep learning to assist protein structure prediction as protein structure prediction models can be utilized to screen a large number of novel sequences. In this regard, we propose a model employing Energy Profile for atom pairs in conjunction with the Legion-Class Bayes function called Energy Profile Legion-Class Bayes Protein Structure Identification model. Followed by this, we use a Thompson Optimized convolutional neural network to extract features between amino acids and then the Thompson Optimized SoftMax function is employed to extract associations between protein sequences for predicting secondary protein structure. The proposed Energy Profile Bayes and Thompson Optimized Convolutional Neural Network (EPB-OCNN) method tested distinct unique protein data and was compared to the state-of-the-art methods, the Template-Based Modeling, Protein Design using Deep Graph Neural Networks, a deep learning-based S-glutathionylation sites prediction tool called a Computational Framework, the Deep Learning and a distance-based protein structure prediction using deep learning. The results obtained when applied with the Biopython tool with respect to protein structure prediction time, protein structure prediction accuracy, specificity, recall, F-measure, and precision, respectively, are measured. The proposed EPB-OCNN method outperformed the state-of-the-art methods, thereby corroborating the objective.
Collapse
Affiliation(s)
- Varanavasi Nallasamy
- Cognizant Technology Solutions Pvt. Ltd, CHIL SEZ IT Park, Keeranatham, Saravanam Patti, Coimbatore, Tamil Nadu 641035 India
| | - Malarvizhi Seshiah
- Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram, Namakkal, Tamil Nadu India
| |
Collapse
|
48
|
Syrlybaeva R, Strauch EM. Deep learning of protein sequence design of protein-protein interactions. Bioinformatics 2023; 39:btac733. [PMID: 36377772 PMCID: PMC9947925 DOI: 10.1093/bioinformatics/btac733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 09/16/2022] [Accepted: 11/14/2022] [Indexed: 11/16/2022] Open
Abstract
MOTIVATION As more data of experimentally determined protein structures are becoming available, data-driven models to describe protein sequence-structure relationships become more feasible. Within this space, the amino acid sequence design of protein-protein interactions is still a rather challenging subproblem with very low success rates-yet, it is central to most biological processes. RESULTS We developed an attention-based deep learning model inspired by algorithms used for image-caption assignments to design peptides or protein fragment sequences. Our trained model can be applied for the redesign of natural protein interfaces or the designed protein interaction fragments. Here, we validate the potential by recapitulating naturally occurring protein-protein interactions including antibody-antigen complexes. The designed interfaces accurately capture essential native interactions and have comparable native-like binding affinities in silico. Furthermore, our model does not need a precise backbone location, making it an attractive tool for working with de novo design of protein-protein interactions. AVAILABILITY AND IMPLEMENTATION The source code of the method is available at https://github.com/strauchlab/iNNterfaceDesign. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Raulia Syrlybaeva
- Department of Pharmaceutical and Biomedical Sciences, University of Georgia, Athens, GA 30602, USA
| | - Eva-Maria Strauch
- Department of Pharmaceutical and Biomedical Sciences, University of Georgia, Athens, GA 30602, USA
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
49
|
Bansia H, Ramakumar S. Homology Modeling of Antibody Variable Regions: Methods and Applications. Methods Mol Biol 2023; 2627:301-319. [PMID: 36959454 DOI: 10.1007/978-1-0716-2974-1_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Adaptive immunity specifically protects us from antigenic challenges. Antibodies are key effector proteins of adaptive immunity, and they are remarkable in their ability to recognize a virtually limitless number of antigens. Fragment variable (FV), the antigen-binding region of antibodies, can be split into two main components, namely, framework and complementarity determining regions. The framework (FR) consists of light-chain framework (FRL) and heavy-chain framework (FRH). Similarly, the complementarity determining regions (CDRs) comprises of light-chain CDRs 1-3 (CDRs L1-3) and heavy-chain CDRs 1-3 (CDRs H1-3). While FRs are relatively constant in sequence and structure across diverse antibodies, sequence variation in CDRs leading to differential conformations of CDR loops accounts for the distinct antigenic specificities of diverse antibodies. The conserved structural features in FRs and conformity of CDRs to a limited set of standard conformations allow for the accurate prediction of FV models using homology modeling techniques. Antibody structure prediction from its amino acid sequence has numerous important applications including prediction of antibody-antigen interaction interfaces and redesign of therapeutically and biotechnologically useful antibodies with improved affinity. This chapter summarizes the current practices employed in the successful homology modeling of antibody variable regions and the potential applications of the generated homology models.
Collapse
Affiliation(s)
- Harsh Bansia
- Department of Physics, Indian Institute of Science, Bengaluru, India.
- Advanced Science Research Center at The Graduate Center of the City University of New York, New York, NY, USA.
| | | |
Collapse
|
50
|
Chakraborty C, Bhattacharya M, Chatterjee S, Sharma AR, Saha RP, Dhama K, Agoramoorthy G. Integrative Bioinformatics Approaches Indicate a Particular Pattern of Some SARS-CoV-2 and Non-SARS-CoV-2 Proteins. Vaccines (Basel) 2022; 11:vaccines11010038. [PMID: 36679883 PMCID: PMC9864461 DOI: 10.3390/vaccines11010038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 12/12/2022] [Accepted: 12/20/2022] [Indexed: 12/28/2022] Open
Abstract
Pattern recognition plays a critical role in integrative bioinformatics to determine the structural patterns of proteins of viruses such as SARS-CoV-2. This study identifies the pattern of SARS-CoV-2 proteins to depict the structure-function relationships of the protein alphabets of SARS-CoV-2 and COVID-19. The assembly enumeration algorithm, Anisotropic Network Model, Gaussian Network Model, Markovian Stochastic Model, and image comparison protein-like alphabets were used. The distance score was the lowest with 22 for "I" and highest with 40 for "9". For post-processing and decision, two protein alphabets "C" (PDB ID: 6XC3) and "S" (PDB ID: 7OYG) were evaluated to understand the structural, functional, and evolutionary relationships, and we found uniqueness in the functionality of proteins. Here, models were constructed using "SARS-CoV-2 proteins" (12 numbers) and "non-SARS-CoV-2 proteins" (14 numbers) to create two words, "SARS-CoV-2" and "COVID-19". Similarly, we developed two slogans: "Vaccinate the world against COVID-19" and "Say no to SARS-CoV-2", which were made with the proteins structure. It might generate vaccine-related interest to broad reader categories. Finally, the evolutionary process appears to enhance the protein structure smoothly to provide suitable functionality shaped by natural selection.
Collapse
Affiliation(s)
- Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata 700126, West Bengal, India
- Correspondence:
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore 756020, Odisha, India
| | - Srijan Chatterjee
- Institute for Skeletal Aging and Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-si 24252, Gangwon-do, Republic of Korea
| | - Ashish Ranjan Sharma
- Institute for Skeletal Aging and Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-si 24252, Gangwon-do, Republic of Korea
| | - Rudra P. Saha
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata 700126, West Bengal, India
| | - Kuldeep Dhama
- Division of Pathology, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, Uttar Pradesh, India
| | | |
Collapse
|