1
|
Rackovsky S. Techniques for Bioinformatic Applications in Protein Dynamics. Methods Mol Biol 2025; 2870:221-226. [PMID: 39543037 DOI: 10.1007/978-1-0716-4213-9_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
A method is described by which bioinformatic concepts and tools can be applied to the study of protein dynamic properties. Sequences are transformed into numerical strings by representing each amino acid by a residue specific average value of the crystallographic alpha carbon B factor. These dynamic sequences are then Fourier transformed. The Fourier coefficients, each of which contains information about the entire sequence, viewed on a specific length scale, can then be used to study a wide variety of dynamic characteristics in a manner which is completely inaccessible using conventional tools.
Collapse
Affiliation(s)
- Shalom Rackovsky
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY, USA.
| |
Collapse
|
2
|
Kombo DC, LaMarche MJ, Konkankit CC, Rackovsky S. Application of artificial intelligence and machine learning techniques to the analysis of dynamic protein sequences. Proteins 2024; 92:1234-1241. [PMID: 38808365 PMCID: PMC11511649 DOI: 10.1002/prot.26704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/07/2024] [Accepted: 05/13/2024] [Indexed: 05/30/2024]
Abstract
We apply methods of Artificial Intelligence and Machine Learning to protein dynamic bioinformatics. We rewrite the sequences of a large protein data set, containing both folded and intrinsically disordered molecules, using a representation developed previously, which encodes the intrinsic dynamic properties of the naturally occurring amino acids. We Fourier analyze the resulting sequences. It is demonstrated that classification models built using several different supervised learning methods are able to successfully distinguish folded from intrinsically disordered proteins from sequence alone. It is further shown that the most important sequence property for this discrimination is the sequence mobility, which is the sequence averaged value of the residue-specific average alpha carbon B factor. This is in agreement with previous work, in which we have demonstrated the central role played by the sequence mobility in protein dynamic bioinformatics and biophysics. This finding opens a path to the application of dynamic bioinformatics, in combination with machine learning algorithms, to a range of significant biomedical problems.
Collapse
Affiliation(s)
- David C. Kombo
- Dept. of Medicinal Chemistry, Integrated Drug Discovery, Sanofi 350 Water St., Cambridge, MA 02141
| | - Matthew J. LaMarche
- Dept. of Medicinal Chemistry, Integrated Drug Discovery, Sanofi 350 Water St., Cambridge, MA 02141
| | - Chilaluck C. Konkankit
- Dept. of Chemistry and Chemical Biology, Baker Laboratory, Cornell University, Ithaca, NY 14853
| | - S. Rackovsky
- Dept. of Chemistry and Chemical Biology, Baker Laboratory, Cornell University, Ithaca, NY 14853
| |
Collapse
|
3
|
Pei M, Yang P, Li J, Wang Y, Li J, Xu H, Li J. Comprehensive analysis of pepper (Capsicum annuum) RAV genes family and functional identification of CaRAV1 under chilling stress. BMC Genomics 2024; 25:731. [PMID: 39075389 PMCID: PMC11285464 DOI: 10.1186/s12864-024-10639-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 07/19/2024] [Indexed: 07/31/2024] Open
Abstract
BACKGROUND Despite its known significance in plant abiotic stress responses, the role of the RAV gene family in the response of Capsicum annuum to chilling stress remains largely unexplored. RESULTS In this study, we identified and characterized six members of the CaRAV gene subfamily in pepper plants through genome-wide analysis. Subsequently, the CaRAV subfamily was classified into four branches based on homology with Arabidopsis thaliana, each exhibiting relatively conserved domains within the branch. We discovered that light response elements accounted for the majority of CaRAVs, whereas low-temperature response elements were specific to the NGA gene subfamily. After pepper plants were subjected to chilling stress, qRT‒PCR analysis revealed that CaRAV1, CaRAV2 and CaNGA1 were significantly induced in response to chilling stress, indicating that CaRAVs play a role in the response to chilling stress. Using virus-induced gene silencing (VIGS) vectors, we targeted key members of the CaRAV gene family. Under normal growth conditions, the MDA content and SOD enzyme activity of the silenced plants were slightly greater than those of the control plants, and the REC activity was significantly greater than that of the control plants. The levels of MDA and electrolyte leakage were greater in the silenced plants after they were exposed to chilling stress, and the POD and CAT enzyme activities were significantly lower than those in the control, which was particularly evident under repeated chilling stress. In addition, the relative expression of CaPOD and CaCAT was greater in V2 plants upon repeated chilling stress, especially CaCAT was significantly greater in V2 plants than in the other two silenced plants, with 3.29 and 1.10 increases within 12 and 24 h. These findings suggest that CaRAV1 and CaNGA1 positively regulate the response to chilling stress. CONCLUSIONS Silencing of key members of the CaRAV gene family results in increased susceptibility to chilling damage and reduced antioxidant enzyme activity in plants, particularly under repeated chilling stress. This study provides valuable information for understanding the classification and putative functions of RAV transcription factors in pepper plants.
Collapse
Affiliation(s)
- Minkun Pei
- College of Horticulture, Xinjiang Agriculture University, Urumqi, 830052, China
- College of Biological and Agricultural Sciences, Honghe University, Mengzi, Yunnan, 661100, China
| | - Ping Yang
- College of Biological and Agricultural Sciences, Honghe University, Mengzi, Yunnan, 661100, China
| | - Jian Li
- College of Biological and Agricultural Sciences, Honghe University, Mengzi, Yunnan, 661100, China
- College of Horticulture, Gansu Agriculture University, Lanzhou, 730070, China
| | - Yanzhuang Wang
- College of Biological and Agricultural Sciences, Honghe University, Mengzi, Yunnan, 661100, China
- College of Horticulture and Forestry, Tarim University, Alar, 843300, China
| | - Juan Li
- College of Biological and Agricultural Sciences, Honghe University, Mengzi, Yunnan, 661100, China
- College of Horticulture and Forestry, Tarim University, Alar, 843300, China
| | - Hongjun Xu
- College of Horticulture, Xinjiang Agriculture University, Urumqi, 830052, China.
| | - Jie Li
- College of Biological and Agricultural Sciences, Honghe University, Mengzi, Yunnan, 661100, China.
| |
Collapse
|
4
|
Nikte SV, Joshi M, Sengupta D. State-dependent dynamics of extramembrane domains in the β 2 -adrenergic receptor. Proteins 2024; 92:317-328. [PMID: 37864328 DOI: 10.1002/prot.26613] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 08/22/2023] [Accepted: 09/25/2023] [Indexed: 10/22/2023]
Abstract
G protein-coupled receptors (GPCRs) are membrane-bound signaling proteins that play an essential role in cellular signaling processes. Due to their intrinsic function of transmitting internal signals in response to external cues, these receptors are adapted to be highly dynamic in nature. The β2 -adrenergic receptor (β2 AR) is a representative member of the family that has been extensively analyzed in terms of its structure and activation. Although the structure of the transmembrane domain has been characterized in the different functional states of the receptor, the conformational dynamics of the extramembrane domains, especially the intrinsically disordered regions are still emerging. In this study, we analyze the state-dependent dynamics of extramembrane domains of β2 AR using atomistic molecular dynamics simulations. We introduce a parameter, the residue excess dynamics that allows us to better quantify receptor dynamics. Using this measure, we show that the dynamics of the extramembrane domains are sensitive to the receptor state. Interestingly, the ligand-bound intermediateR ' state shows the maximal dynamics compared to either the active R*G or inactive R states. Ligand binding appears to be correlated with high residue excess dynamics that are dampened upon G protein coupling. The intracellular loop-3 (ICL3) domain has a tendency to flip towards the membrane upon ligand binding, which could contribute to receptor "priming." We highlight an important ICL1-helix-8 interplay that is broken in the ligand-bound state but is retained in the active state. Overall, our study highlights the importance of characterizing the functional dynamics of the GPCR loop domains.
Collapse
Affiliation(s)
- Siddhanta V Nikte
- Physical and Materials Chemistry Division, National Chemical Laboratory, Pune, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Manali Joshi
- Bioinformatics Center, Savitribai Phule Pune University, Pune, India
| | - Durba Sengupta
- Physical and Materials Chemistry Division, National Chemical Laboratory, Pune, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
5
|
Štambuk N, Konjevoda P, Štambuk A. How ambiguity codes specify molecular descriptors and information flow in Code Biology. Biosystems 2023; 233:105034. [PMID: 37739308 DOI: 10.1016/j.biosystems.2023.105034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/12/2023] [Accepted: 09/12/2023] [Indexed: 09/24/2023]
Abstract
The article presents IUPAC ambiguity codes for incomplete nucleic acid specification, and their use in Code Biology. It is shown how to use this nomenclature in order to extract accurate information on different properties of the biological systems. We investigated the use of ambiguity codes, as mathematical and logical operators and truth table elements, for the encoding of amino acids by means of the Standard Genetic Code. It is explained how to use ambiguity codes and truth functions in order to obtain accurate information on different properties of the biological systems. Nucleotide ambiguity codes could be applied to: 1. encoding descriptive information of nucleotides, amino acids and proteins (e.g., of polarity, relative solvent accessibility, atom depth, etc.), and 2. system modelling ranging from standard bioinformatics tools to classic evolutionary models (i.e. from Miyazawa-Jernigan statistical potential to Kimura three-substitution-type model, respectively). It is shown that the algorithms based on IUPAC ambiguity codes, Boolean functions and truth table, Probabilistic Square of Opposition/Semiotic Square and Klein 4-groups-could be used for the bioinformatics analyses and Relational data modelling in natural science. Underlying mathematical, logical and semiotic concepts of interest are presented and addressed.
Collapse
Affiliation(s)
- Nikola Štambuk
- Centre for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Albert Štambuk
- Faculty of Kinesiology, University of Zagreb, Horvaćanski zavoj 15, HR-10000 Zagreb, Croatia
| |
Collapse
|
6
|
Pandey A, Liu E, Graham J, Chen W, Keten S. B-factor prediction in proteins using a sequence-based deep learning model. PATTERNS (NEW YORK, N.Y.) 2023; 4:100805. [PMID: 37720331 PMCID: PMC10499862 DOI: 10.1016/j.patter.2023.100805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/23/2023] [Accepted: 07/07/2023] [Indexed: 09/19/2023]
Abstract
B factors provide critical insight into protein dynamics. Predicting B factors of an atom in new proteins remains challenging as it is impacted by their neighbors in Euclidean space. Previous learning methods developed have resulted in low Pearson correlation coefficients beyond the training set due to their limited ability to capture the effect of neighboring atoms. With the advances in deep learning methods, we develop a sequence-based model that is tested on 2,442 proteins and outperforms the state-of-the-art models by 30%. We find that the model learns that the B factor of a site is prominently affected by atoms within a 12-15 Å radius, which is in excellent agreement with cutoffs from protein network models. The ablation study revealed that the B factor can largely be predicted from the primary sequence alone. Based on the abovementioned points, our model lays a foundation for predicting other properties that are correlated with the B factor.
Collapse
Affiliation(s)
- Akash Pandey
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Elaine Liu
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Jacob Graham
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
| | - Sinan Keten
- Department of Mechanical Engineering, Northwestern University, Evanston, IL, USA
- Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL, USA
| |
Collapse
|
7
|
Gorostiola González M, van den Broek RL, Braun TGM, Chatzopoulou M, Jespers W, IJzerman AP, Heitman LH, van Westen GJP. 3DDPDs: describing protein dynamics for proteochemometric bioactivity prediction. A case for (mutant) G protein-coupled receptors. J Cheminform 2023; 15:74. [PMID: 37641107 PMCID: PMC10463931 DOI: 10.1186/s13321-023-00745-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 08/10/2023] [Indexed: 08/31/2023] Open
Abstract
Proteochemometric (PCM) modelling is a powerful computational drug discovery tool used in bioactivity prediction of potential drug candidates relying on both chemical and protein information. In PCM features are computed to describe small molecules and proteins, which directly impact the quality of the predictive models. State-of-the-art protein descriptors, however, are calculated from the protein sequence and neglect the dynamic nature of proteins. This dynamic nature can be computationally simulated with molecular dynamics (MD). Here, novel 3D dynamic protein descriptors (3DDPDs) were designed to be applied in bioactivity prediction tasks with PCM models. As a test case, publicly available G protein-coupled receptor (GPCR) MD data from GPCRmd was used. GPCRs are membrane-bound proteins, which are activated by hormones and neurotransmitters, and constitute an important target family for drug discovery. GPCRs exist in different conformational states that allow the transmission of diverse signals and that can be modified by ligand interactions, among other factors. To translate the MD-encoded protein dynamics two types of 3DDPDs were considered: one-hot encoded residue-specific (rs) and embedding-like protein-specific (ps) 3DDPDs. The descriptors were developed by calculating distributions of trajectory coordinates and partial charges, applying dimensionality reduction, and subsequently condensing them into vectors per residue or protein, respectively. 3DDPDs were benchmarked on several PCM tasks against state-of-the-art non-dynamic protein descriptors. Our rs- and ps3DDPDs outperformed non-dynamic descriptors in regression tasks using a temporal split and showed comparable performance with a random split and in all classification tasks. Combinations of non-dynamic descriptors with 3DDPDs did not result in increased performance. Finally, the power of 3DDPDs to capture dynamic fluctuations in mutant GPCRs was explored. The results presented here show the potential of including protein dynamic information on machine learning tasks, specifically bioactivity prediction, and open opportunities for applications in drug discovery, including oncology.
Collapse
Affiliation(s)
- Marina Gorostiola González
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
- ONCODE Institute, Leiden, The Netherlands
| | - Remco L van den Broek
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Thomas G M Braun
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Magdalini Chatzopoulou
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Willem Jespers
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Laura H Heitman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
- ONCODE Institute, Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
8
|
Konkankit CC, Rackovsky S. Global Survey of Protein Dynamic Properties. J Phys Chem B 2023. [PMID: 37368985 DOI: 10.1021/acs.jpcb.3c02609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2023]
Abstract
Using tools developed to study the dynamic bioinformatics of proteins, we are able to study the dynamic characteristics of very large numbers of protein sequences simultaneously. We study herein the distribution of protein sequences in a space determined by sequence mobility. It is shown that there are statistically significant differences in mobility distribution between folded sequences of different structural classes and between those and sequences of intrinsically disordered proteins. It is also shown that the several regions of mobility space differ significantly with respect to structural makeup. Helical proteins are shown to have distinctive dynamic characteristics at both extremes of the mobility spectrum.
Collapse
Affiliation(s)
- Chilaluck C Konkankit
- Department of Chemistry and Chemical Biology, Baker Laboratory, Cornell University, Ithaca, New York 14853, United States
| | - S Rackovsky
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, United States
- Department of Chemistry and Chemical Biology, Baker Laboratory, Cornell University, Ithaca, New York 14853, United States
| |
Collapse
|
9
|
Rackovsky S. Structure Class Encoding in Protein Dynamic Bioinformatics. J Phys Chem B 2022; 126:5730-5734. [PMID: 35900129 DOI: 10.1021/acs.jpcb.2c02502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Using recently developed methods for studying the bioinformatics of protein dynamics, we investigate differences in dynamic characteristics between the sequences of proteins that fall into different structural classes. It is shown that there is a clear differentiation of dynamic properties of sequences as a function of structural class. Taken together with previous results we have developed, the present work demonstrates that dynamic properties are associated with structural behavior in two ways. The determination as to whether a given sequence folds is governed by the long-length-scale organization of the sequence. If the sequence folds, the choice of architectural class is governed by short- and intermediate-length-scale organization.
Collapse
Affiliation(s)
- S Rackovsky
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, United States.,Department of Chemistry and Chemical Biology Baker Laboratory, Cornell University, Ithaca, New York 14853, United States
| |
Collapse
|
10
|
Konkankit C, Rackovsky S. The dynamic basis of structural order in proteins. Proteins 2022; 90:1115-1118. [PMID: 34981860 PMCID: PMC9007817 DOI: 10.1002/prot.26296] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 12/26/2021] [Accepted: 12/29/2021] [Indexed: 01/21/2023]
Abstract
We compare the sequences of folded and intrinsically disordered proteins (IDPs), using bioinformatic methods recently developed to study protein dynamic properties. We demonstrate that the two classes of sequences are organized in diametrically opposite ways with respect to long-length-scale dynamic properties. We further demonstrate a statistically significant difference between the amino acid compositions of folded and disordered proteins, which is expressed in dynamic properties. Our results indicate that the long-length-scale properties of sequences are critical in determining whether proteins are able to fold, and, more generally, that they are central to an understanding of protein physics. They further provide a physical basis for the empirically observed differences in amino acid composition between folded and IDPs.
Collapse
Affiliation(s)
- Chilaluck Konkankit
- Department of Chemistry and Chemical Biology, Baker Laboratory, Cornell University, Ithaca, New York, USA
| | - S Rackovsky
- Department of Chemistry and Chemical Biology, Baker Laboratory, Cornell University, Ithaca, New York, USA.,Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| |
Collapse
|
11
|
|
12
|
Carugo O. Uses and Abuses of the Atomic Displacement Parameters in Structural Biology. Methods Mol Biol 2022; 2449:281-298. [PMID: 35507268 DOI: 10.1007/978-1-0716-2095-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
B-factors determined with X-ray crystallographic analyses are commonly used to estimate the flexibility degree of atoms, residues, and molecular moieties in biological macromolecules. In this chapter, the most recent studies and applications of B-factors in protein engineering and structural biology are briefly summarized. Particular emphasis is given to the limitations in using B-factors, in order to prevent inappropriate applications. It is eventually predicted that future applications will involve anisotropically refined B-factors, deep learning, and data produced by cryo-EM.
Collapse
|
13
|
Scheraga H, Rackovsky S. Dynamic and conformational switching in proteins. Biopolymers 2021; 112:e23411. [PMID: 33270217 PMCID: PMC8172660 DOI: 10.1002/bip.23411] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 11/13/2020] [Accepted: 11/18/2020] [Indexed: 11/06/2022]
Abstract
Using bioinformatic methods for treating protein dynamics, developed in earlier work, we study the relationship between sequence mobility and dynamics in proteins. It is shown that sequence mobility drives a transition between two dynamic regimes in proteins, and that the specific details of this transition differ qualitatively between α-helical proteins and those in other structural classes. We examine the possibility that conformational switching is related to dynamic switching, by considering a specific system of sequences which exhibit the switching phenomenon. It is shown that a relationship between dynamic and conformational switching is entirely plausible.
Collapse
Affiliation(s)
- H.A. Scheraga
- Department of Chemistry and Chemical Biology, Baker Laboratory, Cornell University Ithaca, NY 14853
| | - S. Rackovsky
- Department of Chemistry and Chemical Biology, Baker Laboratory, Cornell University Ithaca, NY 14853
- Department of Biochemistry and Biophysics University of Rochester School of Medicine and Dentistry Rochester, NY 14642
| |
Collapse
|