1
|
Basha S, Mukunda DC, Pai AR, Mahato KK. Assessing amyloid fibrils and amorphous aggregates: A review. Int J Biol Macromol 2025; 311:143725. [PMID: 40324497 DOI: 10.1016/j.ijbiomac.2025.143725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Revised: 04/23/2025] [Accepted: 04/29/2025] [Indexed: 05/07/2025]
Abstract
Protein misfolding and aggregation play a central role in the progression of neurodegenerative diseases such as Alzheimer's and Parkinson's. These aggregates manifest either as structured amyloid fibrils enriched in β-sheet conformations or as irregular amorphous aggregates with diverse morphologies. Understanding their formation, structure, and behavior is critical for deciphering disease mechanisms and developing targeted diagnostics and therapeutics. This review presents an integrated overview of both conventional and advanced techniques used to detect, distinguish, and structurally characterize these protein aggregates. It covers a range of spectroscopic and spectrometric tools, such as fluorescence, Raman, and mass spectrometry that facilitate aggregate identification. Microscopy methods, including atomic force and electron microscopy, are highlighted for morphological analysis. The review also discusses in situ detection strategies using fluorescent dyes, conformation-specific antibodies, enzymatic reporters, and real-time imaging. Separation methods like centrifugation, electrophoresis, and chromatography are outlined alongside structural analysis tools such as X-ray diffraction. Furthermore, the growing utility of computational approaches and artificial intelligence in predicting aggregation propensities and integrating biological data is emphasized. By critically evaluating each method's capabilities and limitations, this review provides a practical and forward-looking resource for researchers studying the complex landscape of protein aggregation.
Collapse
Affiliation(s)
- Shaik Basha
- Department of Biophysics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Manipal 576104, Karnataka, India
| | | | - Aparna Ramakrishna Pai
- Department of Neurology, Kasturba Medical College Manipal, Manipal Academy of Higher Education, Manipal 576104, Karnataka, India
| | - Krishna Kishore Mahato
- Department of Biophysics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Manipal 576104, Karnataka, India.
| |
Collapse
|
2
|
Pretti E, Shell MS. Characterizing the Sequence Landscape of Peptide Fibrillization with a Bottom-Up Coarse-Grained Model. J Phys Chem B 2025; 129:3559-3570. [PMID: 40146906 DOI: 10.1021/acs.jpcb.4c07248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2025]
Abstract
Molecular insight into amyloid aggregation is crucial for understanding the details of protein fibril nucleation and growth, which play a significant role in a wide range of proteinopathies. The length and time scales for fibrillization make its computational study an intrinsically multiscale problem, necessitating the use of coarse-grained modeling. A wide variety of coarse-grained models for peptides have been proposed, often parametrized with a combination of top-down and bottom-up approaches. Here, we present a predictive, sequence-transferable bottom-up coarse-grained model, systematically developed using only information from atomistic simulations by applying an extended-ensemble relative entropy minimization technique. The resulting model is capable of accurately recovering conformational properties of peptides constructed from a reduced alphabet of amino acids, of predicting secondary structures of isolated and interacting peptides from their sequences alone, and of simulating aggregation of peptides that have been experimentally characterized as amyloidogenic. Finally, we couple such coarse-grained simulations with a genetic algorithm to characterize the sequence space of the reduced alphabet and identify features of sequences for which ordered fibrillar states are both thermodynamically favorable and kinetically accessible.
Collapse
Affiliation(s)
- Evan Pretti
- Department of Chemical Engineering, Engineering II Building, University of California, Santa Barbara, Santa Barbara, California 93106-5080, United States
| | - M Scott Shell
- Department of Chemical Engineering, Engineering II Building, University of California, Santa Barbara, Santa Barbara, California 93106-5080, United States
| |
Collapse
|
3
|
Hassan M, Shahzadi S, Li MS, Kloczkowski A. Prediction and Evaluation of Protein Aggregation with Computational Methods. Methods Mol Biol 2025; 2867:299-314. [PMID: 39576588 PMCID: PMC12126135 DOI: 10.1007/978-1-0716-4196-5_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Protein and peptide aggregation has recently become one of the most studied biomedical problems due to its central role in several neurodegenerative disorders and of biotechnological importance. Multiple in silico methods, databases, tools, and algorithms have been developed to predict aggregation of proteins and peptides to better understand fundamental mechanisms of various aggregation diseases. Here, we attempt to provide a brief overview of bioinformatic methods and tools to better understand molecular mechanisms of aggregation disorders. Furthermore, through a better understanding of protein aggregation mechanisms, it might be possible to design novel therapeutic agents to treat and hopefully prevent protein aggregation diseases.
Collapse
Affiliation(s)
- Mubashir Hassan
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA.
| | - Saba Shahzadi
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
| | - Mai Suan Li
- Institute of Physics, Polish Academy of Sciences, Warsaw, Poland
| | - Andrzej Kloczkowski
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA.
- Department of Pediatrics, The Ohio State University, Columbus, OH, USA.
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
4
|
Heredia-Torrejón M, Montañez R, González-Meneses A, Carcavilla A, Medina MA, Lechuga-Sancho AM. VUS next in rare diseases? Deciphering genetic determinants of biomolecular condensation. Orphanet J Rare Dis 2024; 19:327. [PMID: 39243101 PMCID: PMC11380411 DOI: 10.1186/s13023-024-03307-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 08/06/2024] [Indexed: 09/09/2024] Open
Abstract
The diagnostic odysseys for rare disease patients are getting shorter as next-generation sequencing becomes more widespread. However, the complex genetic diversity and factors influencing expressivity continue to challenge accurate diagnosis, leaving more than 50% of genetic variants categorized as variants of uncertain significance.Genomic expression intricately hinges on localized interactions among its products. Conventional variant prioritization, biased towards known disease genes and the structure-function paradigm, overlooks the potential impact of variants shaping the composition, location, size, and properties of biomolecular condensates, genuine membraneless organelles swiftly sensing and responding to environmental changes, and modulating expressivity.To address this complexity, we propose to focus on the nexus of genetic variants within biomolecular condensates determinants. Scrutinizing variant effects in these membraneless organelles could refine prioritization, enhance diagnostics, and unveil the molecular underpinnings of rare diseases. Integrating comprehensive genome sequencing, transcriptomics, and computational models can unravel variant pathogenicity and disease mechanisms, enabling precision medicine. This paper presents the rationale driving our proposal and describes a protocol to implement this approach. By fusing state-of-the-art knowledge and methodologies into the clinical practice, we aim to redefine rare diseases diagnosis, leveraging the power of scientific advancement for more informed medical decisions.
Collapse
Affiliation(s)
- María Heredia-Torrejón
- Inflammation, Nutrition, Metabolism and Oxidative Stress Research Laboratory, Biomedical Research and Innovation Institute of Cadiz (INiBICA), Cadiz, Spain
- Mother and Child Health and Radiology Department. Area of Clinical Genetics, University of Cadiz. Faculty of Medicine, Cadiz, Spain
| | - Raúl Montañez
- Inflammation, Nutrition, Metabolism and Oxidative Stress Research Laboratory, Biomedical Research and Innovation Institute of Cadiz (INiBICA), Cadiz, Spain.
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucía Tech, E-29071, Málaga, Spain.
| | - Antonio González-Meneses
- Division of Dysmorphology, Department of Paediatrics, Virgen del Rocio University Hospital, Sevilla, Spain
- Department of Paediatrics, Medical School, University of Sevilla, Sevilla, Spain
| | - Atilano Carcavilla
- Pediatric Endocrinology Department, Hospital Universitario La Paz, 28046, Madrid, Spain
- Multidisciplinary Unit for RASopathies, Hospital Universitario La Paz, 28046, Madrid, Spain
| | - Miguel A Medina
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucía Tech, E-29071, Málaga, Spain.
- Biomedical Research Institute and nanomedicine platform of Málaga IBIMA-BIONAND, E-29071, Málaga, Spain.
- CIBER de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, E-28029, Madrid, Spain.
| | - Alfonso M Lechuga-Sancho
- Inflammation, Nutrition, Metabolism and Oxidative Stress Research Laboratory, Biomedical Research and Innovation Institute of Cadiz (INiBICA), Cadiz, Spain
- Division of Endocrinology, Department of Paediatrics, Puerta del Mar University Hospital, Cádiz, Spain
- Area of Paediatrics, Department of Child and Mother Health and Radiology, Medical School, University of Cadiz, Cadiz, Spain
| |
Collapse
|
5
|
Ghosh D, Biswas A, Radhakrishna M. Advanced computational approaches to understand protein aggregation. BIOPHYSICS REVIEWS 2024; 5:021302. [PMID: 38681860 PMCID: PMC11045254 DOI: 10.1063/5.0180691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 03/18/2024] [Indexed: 05/01/2024]
Abstract
Protein aggregation is a widespread phenomenon implicated in debilitating diseases like Alzheimer's, Parkinson's, and cataracts, presenting complex hurdles for the field of molecular biology. In this review, we explore the evolving realm of computational methods and bioinformatics tools that have revolutionized our comprehension of protein aggregation. Beginning with a discussion of the multifaceted challenges associated with understanding this process and emphasizing the critical need for precise predictive tools, we highlight how computational techniques have become indispensable for understanding protein aggregation. We focus on molecular simulations, notably molecular dynamics (MD) simulations, spanning from atomistic to coarse-grained levels, which have emerged as pivotal tools in unraveling the complex dynamics governing protein aggregation in diseases such as cataracts, Alzheimer's, and Parkinson's. MD simulations provide microscopic insights into protein interactions and the subtleties of aggregation pathways, with advanced techniques like replica exchange molecular dynamics, Metadynamics (MetaD), and umbrella sampling enhancing our understanding by probing intricate energy landscapes and transition states. We delve into specific applications of MD simulations, elucidating the chaperone mechanism underlying cataract formation using Markov state modeling and the intricate pathways and interactions driving the toxic aggregate formation in Alzheimer's and Parkinson's disease. Transitioning we highlight how computational techniques, including bioinformatics, sequence analysis, structural data, machine learning algorithms, and artificial intelligence have become indispensable for predicting protein aggregation propensity and locating aggregation-prone regions within protein sequences. Throughout our exploration, we underscore the symbiotic relationship between computational approaches and empirical data, which has paved the way for potential therapeutic strategies against protein aggregation-related diseases. In conclusion, this review offers a comprehensive overview of advanced computational methodologies and bioinformatics tools that have catalyzed breakthroughs in unraveling the molecular basis of protein aggregation, with significant implications for clinical interventions, standing at the intersection of computational biology and experimental research.
Collapse
Affiliation(s)
- Deepshikha Ghosh
- Department of Biological Sciences and Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| | - Anushka Biswas
- Department of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| | | |
Collapse
|
6
|
Khalili K, Farzam F, Dabirmanesh B, Khajeh K. Prediction of protein aggregation. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 206:229-263. [PMID: 38811082 DOI: 10.1016/bs.pmbts.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
The scientific community is very interested in protein aggregation because of its involvement in several neurodegenerative diseases and its significance in industry. Remarkably, fibrillar aggregates are utilized naturally for constructing structural scaffolds or creating biological switches and may be intentionally designed to construct versatile nanomaterials. Consequently, there is a significant need to rationalize and predict protein aggregation. Researchers have developed various computational methodologies and algorithms to predict protein aggregation and understand its underlying mechanics. This chapter aims to summarize the significant advancements in computational methods, accessible resources, and prospective developments in the field of in silico research. We assess the existing computational tools for predicting protein aggregation propensities, detecting areas that are prone to sequential and structural aggregation, analyzing the effects of mutations on protein aggregation, or identifying prion-like domains.
Collapse
Affiliation(s)
- Kavyan Khalili
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Farnoosh Farzam
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Bahareh Dabirmanesh
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Khosro Khajeh
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran.
| |
Collapse
|
7
|
Liao S, Zhang Y, Han X, Wang T, Wang X, Yan Q, Li Q, Qi Y, Zhang Z. A sequence-based model for identifying proteins undergoing liquid-liquid phase separation/forming fibril aggregates via machine learning. Protein Sci 2024; 33:e4927. [PMID: 38380794 PMCID: PMC10880426 DOI: 10.1002/pro.4927] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 01/27/2024] [Accepted: 01/30/2024] [Indexed: 02/22/2024]
Abstract
Liquid-liquid phase separation (LLPS) and the solid aggregate (also referred to as amyloid aggregates) formation of proteins, have gained significant attention in recent years due to their associations with various physiological and pathological processes in living organisms. The systematic investigation of the differences and connections between proteins undergoing LLPS and those forming amyloid fibrils at the sequence level has not yet been explored. In this research, we aim to address this gap by comparing the two types of proteins across 36 features using collected data available currently. The statistical comparison results indicate that, 24 of the selected 36 features exhibit significant difference between the two protein groups. A LLPS-Fibrils binary classification model built on these 24 features using random forest reveals that the fraction of intrinsically disordered residues (FIDR ) is identified as the most crucial feature. While, in the further three-class LLPS-Fibrils-Background classification model built on the same screened features, the composition of cysteine and that of leucine show more significant contributions than others. Through feature ablation analysis, we finally constructed a model FLFB (Feature-based LLPS-Fibrils-Background protein predictor) using six refined features, with an average area under the receiver operating characteristics of 0.83. This work indicates using sequence features and a machine learning model, proteins undergoing LLPS or forming amyloid fibrils can be identified.
Collapse
Affiliation(s)
- Shaofeng Liao
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Yujun Zhang
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Xinchen Han
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Tinglan Wang
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Xi Wang
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Qinglin Yan
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Qian Li
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Yifei Qi
- School of PharmacyFudan UniversityShanghaiChina
| | - Zhuqing Zhang
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| |
Collapse
|
8
|
Chen Z, Wang X, Chen X, Huang J, Wang C, Wang J, Wang Z. Accelerating therapeutic protein design with computational approaches toward the clinical stage. Comput Struct Biotechnol J 2023; 21:2909-2926. [PMID: 38213894 PMCID: PMC10781723 DOI: 10.1016/j.csbj.2023.04.027] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/11/2023] [Accepted: 04/27/2023] [Indexed: 01/13/2024] Open
Abstract
Therapeutic protein, represented by antibodies, is of increasing interest in human medicine. However, clinical translation of therapeutic protein is still largely hindered by different aspects of developability, including affinity and selectivity, stability and aggregation prevention, solubility and viscosity reduction, and deimmunization. Conventional optimization of the developability with widely used methods, like display technologies and library screening approaches, is a time and cost-intensive endeavor, and the efficiency in finding suitable solutions is still not enough to meet clinical needs. In recent years, the accelerated advancement of computational methodologies has ushered in a transformative era in the field of therapeutic protein design. Owing to their remarkable capabilities in feature extraction and modeling, the integration of cutting-edge computational strategies with conventional techniques presents a promising avenue to accelerate the progression of therapeutic protein design and optimization toward clinical implementation. Here, we compared the differences between therapeutic protein and small molecules in developability and provided an overview of the computational approaches applicable to the design or optimization of therapeutic protein in several developability issues.
Collapse
Affiliation(s)
- Zhidong Chen
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xinpei Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xu Chen
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Juyang Huang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Chenglin Wang
- Shenzhen Qiyu Biotechnology Co., Ltd, Shenzhen 518107, China
| | - Junqing Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Zhe Wang
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
| |
Collapse
|
9
|
Qing R, Hao S, Smorodina E, Jin D, Zalevsky A, Zhang S. Protein Design: From the Aspect of Water Solubility and Stability. Chem Rev 2022; 122:14085-14179. [PMID: 35921495 PMCID: PMC9523718 DOI: 10.1021/acs.chemrev.1c00757] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Indexed: 12/13/2022]
Abstract
Water solubility and structural stability are key merits for proteins defined by the primary sequence and 3D-conformation. Their manipulation represents important aspects of the protein design field that relies on the accurate placement of amino acids and molecular interactions, guided by underlying physiochemical principles. Emulated designer proteins with well-defined properties both fuel the knowledge-base for more precise computational design models and are used in various biomedical and nanotechnological applications. The continuous developments in protein science, increasing computing power, new algorithms, and characterization techniques provide sophisticated toolkits for solubility design beyond guess work. In this review, we summarize recent advances in the protein design field with respect to water solubility and structural stability. After introducing fundamental design rules, we discuss the transmembrane protein solubilization and de novo transmembrane protein design. Traditional strategies to enhance protein solubility and structural stability are introduced. The designs of stable protein complexes and high-order assemblies are covered. Computational methodologies behind these endeavors, including structure prediction programs, machine learning algorithms, and specialty software dedicated to the evaluation of protein solubility and aggregation, are discussed. The findings and opportunities for Cryo-EM are presented. This review provides an overview of significant progress and prospects in accurate protein design for solubility and stability.
Collapse
Affiliation(s)
- Rui Qing
- State
Key Laboratory of Microbial Metabolism, School of Life Sciences and
Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Media
Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
- The
David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Shilei Hao
- Media
Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
- Key
Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing 400030, China
| | - Eva Smorodina
- Department
of Immunology, University of Oslo and Oslo
University Hospital, Oslo 0424, Norway
| | - David Jin
- Avalon GloboCare
Corp., Freehold, New Jersey 07728, United States
| | - Arthur Zalevsky
- Laboratory
of Bioinformatics Approaches in Combinatorial Chemistry and Biology, Shemyakin−Ovchinnikov Institute of Bioorganic
Chemistry RAS, Moscow 117997, Russia
| | - Shuguang Zhang
- Media
Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
10
|
Roca-Martinez J, Lazar T, Gavalda-Garcia J, Bickel D, Pancsa R, Dixit B, Tzavella K, Ramasamy P, Sanchez-Fornaris M, Grau I, Vranken WF. Challenges in describing the conformation and dynamics of proteins with ambiguous behavior. Front Mol Biosci 2022; 9:959956. [PMID: 35992270 PMCID: PMC9382080 DOI: 10.3389/fmolb.2022.959956] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 06/27/2022] [Indexed: 11/13/2022] Open
Abstract
Traditionally, our understanding of how proteins operate and how evolution shapes them is based on two main data sources: the overall protein fold and the protein amino acid sequence. However, a significant part of the proteome shows highly dynamic and/or structurally ambiguous behavior, which cannot be correctly represented by the traditional fixed set of static coordinates. Representing such protein behaviors remains challenging and necessarily involves a complex interpretation of conformational states, including probabilistic descriptions. Relating protein dynamics and multiple conformations to their function as well as their physiological context (e.g., post-translational modifications and subcellular localization), therefore, remains elusive for much of the proteome, with studies to investigate the effect of protein dynamics relying heavily on computational models. We here investigate the possibility of delineating three classes of protein conformational behavior: order, disorder, and ambiguity. These definitions are explored based on three different datasets, using interpretable machine learning from a set of features, from AlphaFold2 to sequence-based predictions, to understand the overlap and differences between these datasets. This forms the basis for a discussion on the current limitations in describing the behavior of dynamic and ambiguous proteins.
Collapse
Affiliation(s)
- Joel Roca-Martinez
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - Tamas Lazar
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- VIB-VUB Center for Structural Biology, Brussels, Belgium
| | - Jose Gavalda-Garcia
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - David Bickel
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - Rita Pancsa
- Research Centre for Natural Sciences, Institute of Enzymology, Budapest, Hungary
| | - Bhawna Dixit
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- IBiTech-Biommeda, Universiteit Gent, Gent, Belgium
| | - Konstantina Tzavella
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - Pathmanaban Ramasamy
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- VIB-UGent Center for Medical Biotechnology, Universiteit Gent, Gent, Belgium
| | - Maite Sanchez-Fornaris
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- Department of Computer Sciences, University of Camagüey, Camagüey, Cuba
| | - Isel Grau
- Information Systems, Eindhoven University of Technology, Eindhoven, Netherlands
| | - Wim F. Vranken
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| |
Collapse
|
11
|
Charoenkwan P, Ahmed S, Nantasenamat C, Quinn JMW, Moni MA, Lio' P, Shoombuatong W. AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning. Sci Rep 2022; 12:7697. [PMID: 35546347 PMCID: PMC9095707 DOI: 10.1038/s41598-022-11897-z] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 05/03/2022] [Indexed: 12/13/2022] Open
Abstract
Amyloid proteins have the ability to form insoluble fibril aggregates that have important pathogenic effects in many tissues. Such amyloidoses are prominently associated with common diseases such as type 2 diabetes, Alzheimer's disease, and Parkinson's disease. There are many types of amyloid proteins, and some proteins that form amyloid aggregates when in a misfolded state. It is difficult to identify such amyloid proteins and their pathogenic properties, but a new and effective approach is by developing effective bioinformatics tools. While several machine learning (ML)-based models for in silico identification of amyloid proteins have been proposed, their predictive performance is limited. In this study, we present AMYPred-FRL, a novel meta-predictor that uses a feature representation learning approach to achieve more accurate amyloid protein identification. AMYPred-FRL combined six well-known ML algorithms (extremely randomized tree, extreme gradient boosting, k-nearest neighbor, logistic regression, random forest, and support vector machine) with ten different sequence-based feature descriptors to generate 60 probabilistic features (PFs), as opposed to state-of-the-art methods developed by a single feature-based approach. A logistic regression recursive feature elimination (LR-RFE) method was used to find the optimal m number of 60 PFs in order to improve the predictive performance. Finally, using the meta-predictor approach, the 20 selected PFs were fed into a logistic regression method to create the final hybrid model (AMYPred-FRL). Both cross-validation and independent tests showed that AMYPred-FRL achieved superior predictive performance than its constituent baseline models. In an extensive independent test, AMYPred-FRL outperformed the existing methods by 5.5% and 16.1%, respectively, with accuracy and MCC of 0.873 and 0.710. To expedite high-throughput prediction, a user-friendly web server of AMYPred-FRL is freely available at http://pmlabstack.pythonanywhere.com/AMYPred-FRL. It is anticipated that AMYPred-FRL will be a useful tool in helping researchers to identify new amyloid proteins.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Saeed Ahmed
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Julian M W Quinn
- Bone Biology Division, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, 2010, Australia
| | - Mohammad Ali Moni
- Artificial Intelligence and Digital Health Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
12
|
Computational methods to predict protein aggregation. Curr Opin Struct Biol 2022; 73:102343. [PMID: 35240456 DOI: 10.1016/j.sbi.2022.102343] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 12/20/2021] [Accepted: 01/17/2022] [Indexed: 01/13/2023]
Abstract
In most cases, protein aggregation stems from the establishment of non-native intermolecular contacts. The formation of insoluble protein aggregates is associated with many human diseases and is a major bottleneck for the industrial production of protein-based therapeutics. Strikingly, fibrillar aggregates are naturally exploited for structural scaffolding or to generate molecular switches and can be artificially engineered to build up multi-functional nanomaterials. Thus, there is a high interest in rationalizing and forecasting protein aggregation. Here, we review the available computational toolbox to predict protein aggregation propensities, identify sequential or structural aggregation-prone regions, evaluate the impact of mutations on aggregation or recognize prion-like domains. We discuss the strengths and limitations of these algorithms and how they can evolve in the next future.
Collapse
|
13
|
Bioinformatics Methods in Predicting Amyloid Propensity of Peptides and Proteins. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2340:1-15. [PMID: 35167067 DOI: 10.1007/978-1-0716-1546-1_1] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Several computational methods have been developed to predict amyloid propensity of a protein or peptide. These bioinformatics tools are time- and cost-saving alternatives to expensive and laborious experimental methods which are used to confirm self-aggregation of a protein. Computational approaches not only allow preselection of reliable candidates for amyloids but, most importantly, are capable of a thorough and informative analysis of a protein, indicating the sequence determinants of protein aggregation, identifying the potential causal mutations and likely mechanisms. Bioinformatics modeling applies several different approaches, which most typically include physicochemical or structure-based modeling, machine learning, or statistics based modeling. Bioinformatics methods typically use the amino acid sequence of a protein as an input, some also include additional information, for example, an available structure. This chapter describes the methods currently used to computationally predict amyloid propensity of a protein or peptide. Since the accuracy of bioinformatics methods may be highly dependent on reference data used to develop and evaluate the predictors, we also briefly present the main databases of amyloids used by the authors of bioinformatics tools.
Collapse
|
14
|
Orlando G, Raimondi D, Duran-Romaña R, Moreau Y, Schymkowitz J, Rousseau F. PyUUL provides an interface between biological structures and deep learning algorithms. Nat Commun 2022; 13:961. [PMID: 35181656 PMCID: PMC8857184 DOI: 10.1038/s41467-022-28327-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 01/18/2022] [Indexed: 11/09/2022] Open
Abstract
Structural bioinformatics suffers from the lack of interfaces connecting biological structures and machine learning methods, making the application of modern neural network architectures impractical. This negatively affects the development of structure-based bioinformatics methods, causing a bottleneck in biological research. Here we present PyUUL ( https://pyuul.readthedocs.io/ ), a library to translate biological structures into 3D tensors, allowing an out-of-the-box application of state-of-the-art deep learning algorithms. The library converts biological macromolecules to data structures typical of computer vision, such as voxels and point clouds, for which extensive machine learning research has been performed. Moreover, PyUUL allows an out-of-the box GPU and sparse calculation. Finally, we demonstrate how PyUUL can be used by researchers to address some typical bioinformatics problems, such as structure recognition and docking.
Collapse
Affiliation(s)
- Gabriele Orlando
- Switch Laboratory, VIB-KU Leuven Center for Brain and Disease Research, Herestraat 49, 3000, Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000, Leuven, Belgium
| | | | - Ramon Duran-Romaña
- Switch Laboratory, VIB-KU Leuven Center for Brain and Disease Research, Herestraat 49, 3000, Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000, Leuven, Belgium
| | | | - Joost Schymkowitz
- Switch Laboratory, VIB-KU Leuven Center for Brain and Disease Research, Herestraat 49, 3000, Leuven, Belgium.
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000, Leuven, Belgium.
| | - Frederic Rousseau
- Switch Laboratory, VIB-KU Leuven Center for Brain and Disease Research, Herestraat 49, 3000, Leuven, Belgium.
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000, Leuven, Belgium.
| |
Collapse
|
15
|
Bunc M, Hadži S, Graf C, Bončina M, Lah J. Aggregation Time Machine: A Platform for the Prediction and Optimization of Long-Term Antibody Stability Using Short-Term Kinetic Analysis. J Med Chem 2022; 65:2623-2632. [PMID: 35090111 PMCID: PMC8842250 DOI: 10.1021/acs.jmedchem.1c02010] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Monoclonal antibodies
are the fastest growing class of therapeutics.
However, aggregation limits their shelf life and can lead to adverse
immune responses. Assessment and optimization of the long-term antibody
stability are therefore key challenges in the biologic drug development.
Here, we present a platform based on the analysis of temperature-dependent
aggregation data that can dramatically shorten the assessment of the
long-term aggregation stability and thus accelerate the optimization
of antibody formulations. For a set of antibodies used in the therapeutic
areas from oncology to rheumatology and osteoporosis, we obtain an
accurate prediction of aggregate fractions for up to three years using
the data obtained on a much shorter time scale. Significantly, the
strategy combining kinetic and thermodynamic analysis not only contributes
to a better understanding of the molecular mechanisms of antibody
aggregation but has already proven to be very effective in the development
and production of biological therapeutics.
Collapse
Affiliation(s)
- Marko Bunc
- Technical Research and Development, Global Drug Development, Novartis, Lek d.d., 1234 Mengeš, Slovenia.,Faculty of Chemistry and Chemical Technology, University of Ljubljana, 1000 Ljubljana, Slovenia
| | - San Hadži
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, 1000 Ljubljana, Slovenia
| | - Christian Graf
- Technical Research and Development, Global Drug Development, Novartis, Hexal AG, 82041 Oberhaching, Germany
| | - Matjaž Bončina
- Technical Research and Development, Global Drug Development, Novartis, Lek d.d., 1234 Mengeš, Slovenia
| | - Jurij Lah
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, 1000 Ljubljana, Slovenia
| |
Collapse
|
16
|
Statistical potentials from the Gaussian scaling behaviour of chain fragments buried within protein globules. PLoS One 2022; 17:e0254969. [PMID: 35085247 PMCID: PMC8794220 DOI: 10.1371/journal.pone.0254969] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 10/28/2021] [Indexed: 11/19/2022] Open
Abstract
Knowledge-based approaches use the statistics collected from protein data-bank structures to estimate effective interaction potentials between amino acid pairs. Empirical relations are typically employed that are based on the crucial choice of a reference state associated to the null interaction case. Despite their significant effectiveness, the physical interpretation of knowledge-based potentials has been repeatedly questioned, with no consensus on the choice of the reference state. Here we use the fact that the Flory theorem, originally derived for chains in a dense polymer melt, holds also for chain fragments within the core of globular proteins, if the average over buried fragments collected from different non-redundant native structures is considered. After verifying that the ensuing Gaussian statistics, a hallmark of effectively non-interacting polymer chains, holds for a wide range of fragment lengths, although with significant deviations at short spatial scales, we use it to define a ‘bona fide’ reference state. Notably, despite the latter does depend on fragment length, deviations from it do not. This allows to estimate an effective interaction potential which is not biased by the presence of correlations due to the connectivity of the protein chain. We show how different sequence-independent effective statistical potentials can be derived using this approach by coarse-graining the protein representation at varying levels. The possibility of defining sequence-dependent potentials is explored.
Collapse
|
17
|
Kagami LP, Orlando G, Raimondi D, Ancien F, Dixit B, Gavaldá-García J, Ramasamy P, Roca-Martínez J, Tzavella K, Vranken W. b2bTools: online predictions for protein biophysical features and their conservation. Nucleic Acids Res 2021; 49:W52-W59. [PMID: 34057475 PMCID: PMC8262692 DOI: 10.1093/nar/gkab425] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 04/21/2021] [Accepted: 05/05/2021] [Indexed: 11/13/2022] Open
Abstract
We provide integrated protein sequence-based predictions via https://bio2byte.be/b2btools/. The aim of our predictions is to identify the biophysical behaviour or features of proteins that are not readily captured by structural biology and/or molecular dynamics approaches. Upload of a FASTA file or text input of a sequence provides integrated predictions from DynaMine backbone and side-chain dynamics, conformational propensities, and derived EFoldMine early folding, DisoMine disorder, and Agmata β-sheet aggregation. These predictions, several of which were previously not available online, capture 'emergent' properties of proteins, i.e. the inherent biophysical propensities encoded in their sequence, rather than context-dependent behaviour (e.g. final folded state). In addition, upload of a multiple sequence alignment (MSA) in a variety of formats enables exploration of the biophysical variation observed in homologous proteins. The associated plots indicate the biophysical limits of functionally relevant protein behaviour, with unusual residues flagged by a Gaussian mixture model analysis. The prediction results are available as JSON or CSV files and directly accessible via an API. Online visualisation is available as interactive plots, with brief explanations and tutorial pages included. The server and API employ an email-free token-based system that can be used to anonymously access previously generated results.
Collapse
Affiliation(s)
- Luciano Porto Kagami
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels 1050, Belgium
| | - Gabriele Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels 1050, Belgium
| | - Daniele Raimondi
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels 1050, Belgium
| | - Francois Ancien
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels 1050, Belgium
- 3Bio, Université Libre de Bruxelles, Brussels 1050, Belgium
| | - Bhawna Dixit
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels 1050, Belgium
- VIB Structural Biology Research Centre, Brussels, 1050, Belgium
| | - Jose Gavaldá-García
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels 1050, Belgium
- VIB Structural Biology Research Centre, Brussels, 1050, Belgium
| | - Pathmanaban Ramasamy
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels 1050, Belgium
- VIB Structural Biology Research Centre, Brussels, 1050, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, 9000, Belgium
| | - Joel Roca-Martínez
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels 1050, Belgium
- VIB Structural Biology Research Centre, Brussels, 1050, Belgium
| | - Konstantina Tzavella
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels 1050, Belgium
- VIB Structural Biology Research Centre, Brussels, 1050, Belgium
| | - Wim Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels 1050, Belgium
- VIB Structural Biology Research Centre, Brussels, 1050, Belgium
| |
Collapse
|
18
|
Prabakaran R, Rawat P, Kumar S, Gromiha MM. Evaluation of in silico tools for the prediction of protein and peptide aggregation on diverse datasets. Brief Bioinform 2021; 22:6309925. [PMID: 34181000 DOI: 10.1093/bib/bbab240] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 05/18/2021] [Accepted: 06/02/2021] [Indexed: 01/09/2023] Open
Abstract
Several prediction algorithms and tools have been developed in the last two decades to predict protein and peptide aggregation. These in silico tools aid to predict the aggregation propensity and amyloidogenicity as well as the identification of aggregation-prone regions. Despite the immense interest in the field, it is of prime importance to systematically compare these algorithms for their performance. In this review, we have provided a rigorous performance analysis of nine prediction tools using a variety of assessments. The assessments were carried out on several non-redundant datasets ranging from hexapeptides to protein sequences as well as amyloidogenic antibody light chains to soluble protein sequences. Our analysis reveals the robustness of the current prediction tools and the scope for improvement in their predictive performances. Insights gained from this work provide critical guidance to the scientific community on advantages and limitations of different aggregation prediction methods and make informed decisions about their research needs.
Collapse
Affiliation(s)
| | | | - Sandeep Kumar
- Department of Biotherapeutics Discovery in Boehringer-Ingelheim Pharmaceutical Inc., Ridgefield, CT, USA
| | | |
Collapse
|
19
|
Postic G, Janel N, Moroy G. Representations of protein structure for exploring the conformational space: A speed-accuracy trade-off. Comput Struct Biotechnol J 2021; 19:2618-2625. [PMID: 34025948 PMCID: PMC8120936 DOI: 10.1016/j.csbj.2021.04.049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/19/2021] [Accepted: 04/20/2021] [Indexed: 11/25/2022] Open
Abstract
We compare ten structural representations, either atomistic or coarse-grained. Thus, ten distance-dependent statistical potentials of mean force (PMF) were built. The Cβ-only and Cα + Cβ representations provide the best speed–accuracy trade-off. Including glycines through Cα, in a Cβ-only representation, yields a higher accuracy. We generalize the conclusions to the total information gain (TIG) scoring function.
The recent breakthrough in the field of protein structure prediction shows the relevance of using knowledge-based based scoring functions in combination with a low-resolution 3D representation of protein macromolecules. The choice of not using all atoms is barely supported by any data in the literature, and is mostly motivated by empirical and practical reasons, such as the computational cost of assessing the numerous folds of the protein conformational space. Here, we present a comprehensive study, carried on a large and balanced benchmark of predicted protein structures, to see how different types of structural representations rank in either accuracy or calculation speed, and which ones offer the best compromise between these two criteria. We tested ten representations, including low-resolution, high-resolution, and coarse-grained approaches. We also investigated the generalization of the findings to other formalisms than the widely-used “potential of mean force” (PMF) method. Thus, we observed that representing protein structures by their β carbons—combined or not with Cα—provides the best speed–accuracy trade-off, when using a “total information gain” scoring function. For statistical PMFs, using MARTINI backbone and side-chains beads is the best option. Finally, we also demonstrated the necessity of training the reference state on all atom types, and of including the Cα atoms of glycine residues, in a Cβ-based representation.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
- Corresponding author.
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
20
|
Kagami L, Roca-Martínez J, Gavaldá-García J, Ramasamy P, Feenstra KA, Vranken WF. Online biophysical predictions for SARS-CoV-2 proteins. BMC Mol Cell Biol 2021; 22:23. [PMID: 33892639 PMCID: PMC8062939 DOI: 10.1186/s12860-021-00362-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 04/01/2021] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The SARS-CoV-2 virus, the causative agent of COVID-19, consists of an assembly of proteins that determine its infectious and immunological behavior, as well as its response to therapeutics. Major structural biology efforts on these proteins have already provided essential insights into the mode of action of the virus, as well as avenues for structure-based drug design. However, not all of the SARS-CoV-2 proteins, or regions thereof, have a well-defined three-dimensional structure, and as such might exhibit ambiguous, dynamic behaviour that is not evident from static structure representations, nor from molecular dynamics simulations using these structures. MAIN: We present a website ( https://bio2byte.be/sars2/ ) that provides protein sequence-based predictions of the backbone and side-chain dynamics and conformational propensities of these proteins, as well as derived early folding, disorder, β-sheet aggregation, protein-protein interaction and epitope propensities. These predictions attempt to capture the inherent biophysical propensities encoded in the sequence, rather than context-dependent behaviour such as the final folded state. In addition, we provide the biophysical variation that is observed in homologous proteins, which gives an indication of the limits of their functionally relevant biophysical behaviour. CONCLUSION The https://bio2byte.be/sars2/ website provides a range of protein sequence-based predictions for 27 SARS-CoV-2 proteins, enabling researchers to form hypotheses about their possible functional modes of action.
Collapse
Affiliation(s)
- Luciano Kagami
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, 1050, Brussels, Belgium
| | - Joel Roca-Martínez
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, 1050, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
- VIB Structural Biology Research Centre, Pleinlaan 2, 1050, Brussels, Belgium
| | - Jose Gavaldá-García
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, 1050, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
- VIB Structural Biology Research Centre, Pleinlaan 2, 1050, Brussels, Belgium
| | - Pathmanaban Ramasamy
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, 1050, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
- VIB Structural Biology Research Centre, Pleinlaan 2, 1050, Brussels, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, 9000, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, 9000, Ghent, Belgium
| | - K Anton Feenstra
- IBIVU - Center for Integrative Bioinformatics, Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, 1081HV, The Netherlands
- AIMMS - Amsterdam Institute for Molecules Medicines and Systems, Vrije Universiteit Amsterdam, Amsterdam, 1081HV, The Netherlands
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, 1050, Brussels, Belgium.
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium.
- VIB Structural Biology Research Centre, Pleinlaan 2, 1050, Brussels, Belgium.
| |
Collapse
|
21
|
Prabakaran R, Rawat P, Thangakani AM, Kumar S, Gromiha MM. Protein aggregation: in silico algorithms and applications. Biophys Rev 2021; 13:71-89. [PMID: 33747245 PMCID: PMC7930180 DOI: 10.1007/s12551-021-00778-w] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 01/01/2021] [Indexed: 01/08/2023] Open
Abstract
Protein aggregation is a topic of immense interest to the scientific community due to its role in several neurodegenerative diseases/disorders and industrial importance. Several in silico techniques, tools, and algorithms have been developed to predict aggregation in proteins and understand the aggregation mechanisms. This review attempts to provide an essence of the vast developments in in silico approaches, resources available, and future perspectives. It reviews aggregation-related databases, mechanistic models (aggregation-prone region and aggregation propensity prediction), kinetic models (aggregation rate prediction), and molecular dynamics studies related to aggregation. With a multitude of prediction models related to aggregation already available to the scientific community, the field of protein aggregation is rapidly maturing to tackle new applications.
Collapse
Affiliation(s)
- R. Prabakaran
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu India
| | - Puneet Rawat
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu India
| | - A. Mary Thangakani
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu India
| | - Sandeep Kumar
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceutical Inc., Ridgefield, CT USA
| | - M. Michael Gromiha
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu India
- School of Computing, Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Kanagawa Japan
| |
Collapse
|
22
|
Orlando G, Raimondi D, Kagami LP, Vranken WF. ShiftCrypt: a web server to understand and biophysically align proteins through their NMR chemical shift values. Nucleic Acids Res 2020; 48:W36-W40. [PMID: 32459331 PMCID: PMC7319548 DOI: 10.1093/nar/gkaa391] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 04/21/2020] [Accepted: 05/04/2020] [Indexed: 02/06/2023] Open
Abstract
Nuclear magnetic resonance (NMR) spectroscopy data provides valuable information on the behaviour of proteins in solution. The primary data to determine when studying proteins are the per-atom NMR chemical shifts, which reflect the local environment of atoms and provide insights into amino acid residue dynamics and conformation. Within an amino acid residue, chemical shifts present multi-dimensional and complexly cross-correlated information, making them difficult to analyse. The ShiftCrypt method, based on neural network auto-encoder architecture, compresses the per-amino acid chemical shift information in a single, interpretable, amino acid-type independent value that reflects the biophysical state of a residue. We here present the ShiftCrypt web server, which makes the method readily available. The server accepts chemical shifts input files in the NMR Exchange Format (NEF) or NMR-STAR format, executes ShiftCrypt and visualises the results, which are also accessible via an API. It also enables the ”biophysically-based” pairwise alignment of two proteins based on their ShiftCrypt values. This approach uses Dynamic Time Warping and can optionally include their amino acid code information, and has applications in, for example, the alignment of disordered regions. The server uses a token-based system to ensure the anonymity of the users and results. The web server is available at www.bio2byte.be/shiftcrypt.
Collapse
Affiliation(s)
- Gabriele Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, Brussels 1050, Belgium.,Switch Laboratory, VIB, Leuven, Belgium
| | - Daniele Raimondi
- ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
| | - Luciano Porto Kagami
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, Brussels 1050, Belgium
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, Brussels 1050, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, Brussels 1050, Belgium.,VIB Structural Biology Research Centre, Pleinlaan 2, Brussels 1050, Belgium
| |
Collapse
|