1
|
Pan Z, Zhuo L, Wan TY, Chen RY, Li YZ. DnaK duplication and specialization in bacteria correlates with increased proteome complexity. mSystems 2024; 9:e0115423. [PMID: 38530057 PMCID: PMC11019930 DOI: 10.1128/msystems.01154-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Accepted: 03/10/2024] [Indexed: 03/27/2024] Open
Abstract
The chaperone 70 kDa heat shock protein (Hsp70) is important for cells from bacteria to humans to maintain proteostasis, and all eukaryotes and several prokaryotes encode Hsp70 paralogs. Although the mechanisms of Hsp70 function have been clearly illuminated, the function and evolution of Hsp70 paralogs is not well studied. DnaK is a highly conserved bacterial Hsp70 family. Here, we show that dnaK is present in 98.9% of bacterial genomes, and 6.4% of them possess two or more DnaK paralogs. We found that the duplication of dnaK is positively correlated with an increase in proteomic complexity (proteome size, number of domains). We identified the interactomes of the two DnaK paralogs of Myxococcus xanthus DK1622 (MxDnaKs), which revealed that they are mostly nonoverlapping, although both prefer α and β domain proteins. Consistent with the entire M. xanthus proteome, MxDnaK substrates have both significantly more multi-domain proteins and a higher isoelectric point than that of Escherichia coli, which encodes a single DnaK homolog. MxDnaK1 is transcriptionally upregulated in response to heat shock and prefers to bind cytosolic proteins, while MxDnaK2 is downregulated by heat shock and is more associated with membrane proteins. Using domain swapping, we show that the nucleotide-binding domain and the substrate-binding β domain are responsible for the significant differences in DnaK interactomes, and the nucleotide binding domain also determines the dimerization of MxDnaK2, but not MxDnaK1. Our work suggests that bacterial DnaK has been duplicated in order to deal with a more complex proteome, and that this allows evolution of distinct domains to deal with different subsets of target proteins.IMPORTANCEAll eukaryotic and ~40% of prokaryotic species encode multiple 70 kDa heat shock protein (Hsp70) homologs with similar but diversified functions. Here, we show that duplication of canonical Hsp70 (DnaK in prokaryotes) correlates with increasing proteomic complexity and evolution of particular regions of the protein. Using the Myxococcus xanthus DnaK duplicates as a case, we found that their substrate spectrums are mostly nonoverlapping, and are both consistent to that of Escherichia coli DnaK in structural and molecular characteristics, but show differential enrichment of membrane proteins. Domain/region swapping demonstrated that the nucleotide-binding domain and the β substrate-binding domain (SBDβ), but not the SBDα or disordered C-terminal tail region, are responsible for this functional divergence. This work provides the first direct evidence for regional evolution of DnaK paralogs.
Collapse
Affiliation(s)
- Zhuo Pan
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, China
| | - Li Zhuo
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, China
- Suzhou Research Institute, Shandong University, Suzhou, China
| | - Tian-yu Wan
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, China
| | - Rui-yun Chen
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, China
| | - Yue-zhong Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, China
| |
Collapse
|
2
|
Glidden-Handgis G, Wheeler TJ. WAS IT A MATch I SAW? Approximate palindromes lead to overstated false match rates in benchmarks using reversed sequences. BIOINFORMATICS ADVANCES 2024; 4:vbae052. [PMID: 38764475 PMCID: PMC11099658 DOI: 10.1093/bioadv/vbae052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 03/31/2024] [Accepted: 04/04/2024] [Indexed: 05/21/2024]
Abstract
Background Software for labeling biological sequences typically produces a theory-based statistic for each match (the E-value) that indicates the likelihood of seeing that match's score by chance. E-values accurately predict false match rate for comparisons of random (shuffled) sequences, and thus provide a reasoned mechanism for setting score thresholds that enable high sensitivity with low expected false match rate. This threshold-setting strategy is challenged by real biological sequences, which contain regions of local repetition and low sequence complexity that cause excess matches between non-homologous sequences. Knowing this, tool developers often develop benchmarks that use realistic-seeming decoy sequences to explore empirical tradeoffs between sensitivity and false match rate. A recent trend has been to employ reversed biological sequences as realistic decoys, because these preserve the distribution of letters and the existence of local repeats, while disrupting the original sequence's functional properties. However, we and others have observed that sequences appear to produce high scoring alignments to their reversals with surprising frequency, leading to overstatement of false match risk that may negatively affect downstream analysis. Results We demonstrate that an alignment between a sequence S and its (possibly mutated) reversal tends to produce higher scores than alignment between truly unrelated sequences, even when S is a shuffled string with no notable repetitive or low-complexity regions. This phenomenon is due to the unintuitive fact that (even randomly shuffled) sequences contain palindromes that are on average longer than the longest common substrings (LCS) shared between permuted variants of the same sequence. Though the expected palindrome length is only slightly larger than the expected LCS, the distribution of alignment scores involving reversed sequences is strongly right-shifted, leading to greatly increased frequency of high-scoring alignments to reversed sequences. Impact Overestimates of false match risk can motivate unnecessarily high score thresholds, leading to potentially reduced true match sensitivity. Also, when tool sensitivity is only reported up to the score of the first matched decoy sequence, a large decoy set consisting of reversed sequences can obscure sensitivity differences between tools. As a result of these observations, we advise that reversed biological sequences be used as decoys only when care is taken to remove positive matches in the original (un-reversed) sequences, or when overstatement of false labeling is not a concern. Though the primary focus of the analysis is on sequence annotation, we also demonstrate that the prevalence of internal palindromes may lead to an overstatement of the rate of false labels in protein identification with mass spectrometry.
Collapse
Affiliation(s)
| | - Travis J Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ 85721, United States
| |
Collapse
|
3
|
Singleton MD, Eisen MB. Evolutionary analyses of intrinsically disordered regions reveal widespread signals of conservation. PLoS Comput Biol 2024; 20:e1012028. [PMID: 38662765 DOI: 10.1371/journal.pcbi.1012028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/07/2024] [Accepted: 03/28/2024] [Indexed: 05/08/2024] Open
Abstract
Intrinsically disordered regions (IDRs) are segments of proteins without stable three-dimensional structures. As this flexibility allows them to interact with diverse binding partners, IDRs play key roles in cell signaling and gene expression. Despite the prevalence and importance of IDRs in eukaryotic proteomes and various biological processes, associating them with specific molecular functions remains a significant challenge due to their high rates of sequence evolution. However, by comparing the observed values of various IDR-associated properties against those generated under a simulated model of evolution, a recent study found most IDRs across the entire yeast proteome contain conserved features. Furthermore, it showed clusters of IDRs with common "evolutionary signatures," i.e. patterns of conserved features, were associated with specific biological functions. To determine if similar patterns of conservation are found in the IDRs of other systems, in this work we applied a series of phylogenetic models to over 7,500 orthologous IDRs identified in the Drosophila genome to dissect the forces driving their evolution. By comparing models of constrained and unconstrained continuous trait evolution using the Brownian motion and Ornstein-Uhlenbeck models, respectively, we identified signals of widespread constraint, indicating conservation of distributed features is mechanism of IDR evolution common to multiple biological systems. In contrast to the previous study in yeast, however, we observed limited evidence of IDR clusters with specific biological functions, which suggests a more complex relationship between evolutionary constraints and function in the IDRs of multicellular organisms.
Collapse
Affiliation(s)
- Marc D Singleton
- Howard Hughes Medical Institute, UC Berkeley, Berkeley, California, United States of America
| | - Michael B Eisen
- Howard Hughes Medical Institute, UC Berkeley, Berkeley, California, United States of America
- Department of Molecular and Cell Biology, UC Berkeley, Berkeley, California, United States of America
| |
Collapse
|
4
|
Morehead A, Liu J, Cheng J. Protein structure accuracy estimation using geometry-complete perceptron networks. Protein Sci 2024; 33:e4932. [PMID: 38380738 PMCID: PMC10880424 DOI: 10.1002/pro.4932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 01/05/2024] [Accepted: 02/01/2024] [Indexed: 02/22/2024]
Abstract
Estimating the accuracy of protein structural models is a critical task in protein bioinformatics. The need for robust methods in the estimation of protein model accuracy (EMA) is prevalent in the field of protein structure prediction, where computationally-predicted structures need to be screened rapidly for the reliability of the positions predicted for each of their amino acid residues and their overall quality. Current methods proposed for EMA are either coupled tightly to existing protein structure prediction methods or evaluate protein structures without sufficiently leveraging the rich, geometric information available in such structures to guide accuracy estimation. In this work, we propose a geometric message passing neural network referred to as the geometry-complete perceptron network for protein structure EMA (GCPNet-EMA), where we demonstrate through rigorous computational benchmarks that GCPNet-EMA's accuracy estimations are 47% faster and more than 10% (6%) more correlated with ground-truth measures of per-residue (per-target) structural accuracy compared to baseline state-of-the-art methods for tertiary (multimer) structure EMA including AlphaFold 2. The source code and data for GCPNet-EMA are available on GitHub, and a public web server implementation is freely available.
Collapse
Affiliation(s)
- Alex Morehead
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jian Liu
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| |
Collapse
|
5
|
Soh WT, Roetschke HP, Cormican JA, Teo BF, Chiam NC, Raabe M, Pflanz R, Henneberg F, Becker S, Chari A, Liu H, Urlaub H, Liepe J, Mishto M. Protein degradation by human 20S proteasomes elucidates the interplay between peptide hydrolysis and splicing. Nat Commun 2024; 15:1147. [PMID: 38326304 PMCID: PMC10850103 DOI: 10.1038/s41467-024-45339-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 01/17/2024] [Indexed: 02/09/2024] Open
Abstract
If and how proteasomes catalyze not only peptide hydrolysis but also peptide splicing is an open question that has divided the scientific community. The debate has so far been based on immunopeptidomics, in vitro digestions of synthetic polypeptides as well as ex vivo and in vivo experiments, which could only indirectly describe proteasome-catalyzed peptide splicing of full-length proteins. Here we develop a workflow-and cognate software - to analyze proteasome-generated non-spliced and spliced peptides produced from entire proteins and apply it to in vitro digestions of 15 proteins, including well-known intrinsically disordered proteins such as human tau and α-Synuclein. The results confirm that 20S proteasomes produce a sizeable variety of cis-spliced peptides, whereas trans-spliced peptides are a minority. Both peptide hydrolysis and splicing produce peptides with well-defined characteristics, which hint toward an intricate regulation of both catalytic activities. At protein level, both non-spliced and spliced peptides are not randomly localized within protein sequences, but rather concentrated in hotspots of peptide products, in part driven by protein sequence motifs and proteasomal preferences. At sequence level, the different peptide sequence preference of peptide hydrolysis and peptide splicing suggests a competition between the two catalytic activities of 20S proteasomes during protein degradation.
Collapse
Affiliation(s)
- Wai Tuck Soh
- Research Group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, 37077, Göttingen, Germany
| | - Hanna P Roetschke
- Research Group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, 37077, Göttingen, Germany
- Centre for Inflammation Biology and Cancer Immunology & Peter Gorer Department of Immunobiology, King's College London, SE1 1UL, London, UK
- Research Group of Molecular Immunology, Francis Crick Institute, NW1 1AT, London, UK
| | - John A Cormican
- Research Group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, 37077, Göttingen, Germany
| | - Bei Fang Teo
- Centre for Inflammation Biology and Cancer Immunology & Peter Gorer Department of Immunobiology, King's College London, SE1 1UL, London, UK
- Research Group of Molecular Immunology, Francis Crick Institute, NW1 1AT, London, UK
- Immunology Programme, Life Sciences Institute; Immunology Translational Research Program and Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117456, Singapore
| | - Nyet Cheng Chiam
- Research Group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, 37077, Göttingen, Germany
| | - Monika Raabe
- Research Group of Bioanalytical Mass Spectrometry, Max-Planck-Institute for Multidisciplinary Sciences, 37077, Göttingen, Germany
| | - Ralf Pflanz
- Research Group of Bioanalytical Mass Spectrometry, Max-Planck-Institute for Multidisciplinary Sciences, 37077, Göttingen, Germany
| | - Fabian Henneberg
- Department of Structural Dynamics, Max-Planck-Institute for Multidisciplinary Sciences, 37077, Göttingen, Germany
| | - Stefan Becker
- Department of NMR-based Structural Biology, Max-Planck-Institute for Multidisciplinary Sciences, 37077, Göttingen, Germany
| | - Ashwin Chari
- Research Group of Structural Biochemistry and Mechanisms, Max-Planck-Institute for Multidisciplinary Sciences, 37077, Göttingen, Germany
| | - Haiyan Liu
- Immunology Programme, Life Sciences Institute; Immunology Translational Research Program and Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117456, Singapore
| | - Henning Urlaub
- Research Group of Bioanalytical Mass Spectrometry, Max-Planck-Institute for Multidisciplinary Sciences, 37077, Göttingen, Germany
- Institute of Clinical Chemistry, University Medical Center Göttingen, 37075, Göttingen, Germany
| | - Juliane Liepe
- Research Group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, 37077, Göttingen, Germany.
| | - Michele Mishto
- Centre for Inflammation Biology and Cancer Immunology & Peter Gorer Department of Immunobiology, King's College London, SE1 1UL, London, UK.
- Research Group of Molecular Immunology, Francis Crick Institute, NW1 1AT, London, UK.
| |
Collapse
|
6
|
Satalkar V, Degaga GD, Li W, Pang YT, McShan AC, Gumbart JC, Mitchell JC, Torres MP. Generative β-hairpin design using a residue-based physicochemical property landscape. Biophys J 2024:S0006-3495(24)00070-5. [PMID: 38297834 DOI: 10.1016/j.bpj.2024.01.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/20/2023] [Accepted: 01/25/2024] [Indexed: 02/02/2024] Open
Abstract
De novo peptide design is a new frontier that has broad application potential in the biological and biomedical fields. Most existing models for de novo peptide design are largely based on sequence homology that can be restricted based on evolutionarily derived protein sequences and lack the physicochemical context essential in protein folding. Generative machine learning for de novo peptide design is a promising way to synthesize theoretical data that are based on, but unique from, the observable universe. In this study, we created and tested a custom peptide generative adversarial network intended to design peptide sequences that can fold into the β-hairpin secondary structure. This deep neural network model is designed to establish a preliminary foundation of the generative approach based on physicochemical and conformational properties of 20 canonical amino acids, for example, hydrophobicity and residue volume, using extant structure-specific sequence data from the PDB. The beta generative adversarial network model robustly distinguishes secondary structures of β hairpin from α helix and intrinsically disordered peptides with an accuracy of up to 96% and generates artificial β-hairpin peptide sequences with minimum sequence identities around 31% and 50% when compared against the current NCBI PDB and nonredundant databases, respectively. These results highlight the potential of generative models specifically anchored by physicochemical and conformational property features of amino acids to expand the sequence-to-structure landscape of proteins beyond evolutionary limits.
Collapse
Affiliation(s)
- Vardhan Satalkar
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia
| | - Gemechis D Degaga
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee
| | - Wei Li
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia
| | - Yui Tik Pang
- School of Physics, Georgia Institute of Technology, Atlanta, Georgia
| | - Andrew C McShan
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia
| | - James C Gumbart
- School of Physics, Georgia Institute of Technology, Atlanta, Georgia; School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia
| | - Julie C Mitchell
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee.
| | - Matthew P Torres
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia; School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia.
| |
Collapse
|
7
|
Anders KR, Abeyta A, Andrade CC, Bonilla CY, Braley AB, Bratt AG, Duncan KA, Hayes SG, Robinson CJ, Smith-Flores H, Ettinger ASH, Ettinger WF, Fay MM, Haydock J, McKenzie SK, Garlena RA, Russell DA, Poxleitner MK. Genome sequences of 31 mycobacteriophages isolated on Mycobacterium smegmatis mc 2155 at room temperature. Microbiol Resour Announc 2024; 13:e0108623. [PMID: 38099681 DOI: 10.1128/mra.01086-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 11/29/2023] [Indexed: 01/18/2024] Open
Abstract
We report the genome sequences of 31 mycobacteriophages isolated on Mycobacterium smegmatis mc2155 at room temperature. The genomes add to the diversity of Clusters A, B, C, G, and K. Collectively, the genomes include 70 novel protein-coding genes that have no close relatives among the actinobacteriophages.
Collapse
Affiliation(s)
- Kirk R Anders
- Department of Biology, Gonzaga University , Spokane, Washington, USA
| | - Antonio Abeyta
- Department of Biology, Gonzaga University , Spokane, Washington, USA
| | - Christy C Andrade
- Department of Biology, Gonzaga University , Spokane, Washington, USA
| | - Carla Y Bonilla
- Department of Biology, Gonzaga University , Spokane, Washington, USA
| | - Amanda B Braley
- Department of Biology, Gonzaga University , Spokane, Washington, USA
| | - Alexandra G Bratt
- Department of Biology, Gonzaga University , Spokane, Washington, USA
| | - Kaya A Duncan
- Department of Biology, Gonzaga University , Spokane, Washington, USA
| | - Stephen G Hayes
- Department of Biology, Gonzaga University , Spokane, Washington, USA
| | - Ciara J Robinson
- Department of Biology, Gonzaga University , Spokane, Washington, USA
| | | | | | | | - Marta M Fay
- Department of Biology, Gonzaga University , Spokane, Washington, USA
| | - Joseph Haydock
- Department of Biology, Gonzaga University , Spokane, Washington, USA
| | - Sean K McKenzie
- Department of Biology, Gonzaga University , Spokane, Washington, USA
| | - Rebecca A Garlena
- Department of Biological Sciences, University of Pittsburgh , Pittsburgh, Pennsylvania, USA
| | - Daniel A Russell
- Department of Biological Sciences, University of Pittsburgh , Pittsburgh, Pennsylvania, USA
| | | |
Collapse
|
8
|
Sayin AZ, Abali Z, Senyuz S, Cankara F, Gursoy A, Keskin O. Conformational diversity and protein-protein interfaces in drug repurposing in Ras signaling pathway. Sci Rep 2024; 14:1239. [PMID: 38216592 PMCID: PMC10786864 DOI: 10.1038/s41598-023-50913-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 12/27/2023] [Indexed: 01/14/2024] Open
Abstract
We focus on drug repurposing in the Ras signaling pathway, considering structural similarities of protein-protein interfaces. The interfaces formed by physically interacting proteins are found from PDB if available and via PRISM (PRotein Interaction by Structural Matching) otherwise. The structural coverage of these interactions has been increased from 21 to 92% using PRISM. Multiple conformations of each protein are used to include protein dynamics and diversity. Next, we find FDA-approved drugs bound to structurally similar protein-protein interfaces. The results suggest that HIV protease inhibitors tipranavir, indinavir, and saquinavir may bind to EGFR and ERBB3/HER3 interface. Tipranavir and indinavir may also bind to EGFR and ERBB2/HER2 interface. Additionally, a drug used in Alzheimer's disease can bind to RAF1 and BRAF interface. Hence, we propose a methodology to find drugs to be potentially used for cancer using a dataset of structurally similar protein-protein interface clusters rather than pockets in a systematic way.
Collapse
Affiliation(s)
- Ahenk Zeynep Sayin
- Department of Chemical and Biological Engineering, College of Engineering, Koc University, Rumeli Feneri Yolu Sariyer, 34450, Istanbul, Turkey
| | - Zeynep Abali
- Graduate School of Science and Engineering, Computational Sciences and Engineering, Koc University, 34450, Istanbul, Turkey
| | - Simge Senyuz
- Graduate School of Science and Engineering, Computational Sciences and Engineering, Koc University, 34450, Istanbul, Turkey
| | - Fatma Cankara
- Graduate School of Science and Engineering, Computational Sciences and Engineering, Koc University, 34450, Istanbul, Turkey
| | - Attila Gursoy
- Department of Computer Engineering, Koc University, 34450, Istanbul, Turkey
| | - Ozlem Keskin
- Department of Chemical and Biological Engineering, College of Engineering, Koc University, Rumeli Feneri Yolu Sariyer, 34450, Istanbul, Turkey.
| |
Collapse
|
9
|
Kumar YB, Kumar N, Vaikundamani S, Nagamani S, Mahanta HJ, Sastry GM, Sastry GN. Analyzing the aromatic-aromatic interactions in proteins: A 2ID 2.0. Int J Biol Macromol 2023; 253:127207. [PMID: 37797858 DOI: 10.1016/j.ijbiomac.2023.127207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/09/2023] [Accepted: 09/30/2023] [Indexed: 10/07/2023]
Abstract
The Aromatic-Aromatic Interactions Database (A2ID) is a comprehensive repository dedicated to documenting aromatic-aromatic (π-π) networks observed in experimentally determined protein structures. The first version of A2ID was reported in 2011 [Int J Biol Macromol, 2011, 48, 540]. It has undergone a series of significant updates, leading to its current version, which focuses on the identification and analysis of 3,444,619 π-π networks from proteins. The geometrical parameters such as centroid-centroid distances (r) and interplanar angles (ϕ) were used to identify and characterize π-π networks. It was observed that among the 84,500 proteins with at least one aromatic π-π network, about 92.50 % of the instances are found to be either 2π (77.34 %) or 3π (15.23 %) networks. The analysis of interacting amino acid pairs in 2π networks indicated a dominance of PHE residues followed by TYR. The updated version of A2ID incorporates analysis of π-π networks based on SCOP2 and ECOD classifiers, in addition to the existing SCOP, CATH, and EC classifications. This expanded scope allows researchers to explore the characteristics and functional implications of π-π networks in protein structures from multiple perspectives. The current version of A2ID along with its extensive dataset and detailed geometric information is publicly accessible using https://acds.neist.res.in/a2idv2.
Collapse
Affiliation(s)
- Y Bhargav Kumar
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, U. P., India
| | - Nandan Kumar
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India
| | - S Vaikundamani
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India
| | - Selvaraman Nagamani
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, U. P., India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, U. P., India
| | - G Madhavi Sastry
- Schrödinger Inc., HITEC City, Hyderabad, Telangana 500081, India
| | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, U. P., India.
| |
Collapse
|
10
|
Wang T, Wang L, Zhang X, Shen C, Zhang O, Wang J, Wu J, Jin R, Zhou D, Chen S, Liu L, Wang X, Hsieh CY, Chen G, Pan P, Kang Y, Hou T. Comprehensive assessment of protein loop modeling programs on large-scale datasets: prediction accuracy and efficiency. Brief Bioinform 2023; 25:bbad486. [PMID: 38171930 PMCID: PMC10764206 DOI: 10.1093/bib/bbad486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/04/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024] Open
Abstract
Protein loops play a critical role in the dynamics of proteins and are essential for numerous biological functions, and various computational approaches to loop modeling have been proposed over the past decades. However, a comprehensive understanding of the strengths and weaknesses of each method is lacking. In this work, we constructed two high-quality datasets (i.e. the General dataset and the CASP dataset) and systematically evaluated the accuracy and efficiency of 13 commonly used loop modeling approaches from the perspective of loop lengths, protein classes and residue types. The results indicate that the knowledge-based method FREAD generally outperforms the other tested programs in most cases, but encountered challenges when predicting loops longer than 15 and 30 residues on the CASP and General datasets, respectively. The ab initio method Rosetta NGK demonstrated exceptional modeling accuracy for short loops with four to eight residues and achieved the highest success rate on the CASP dataset. The well-known AlphaFold2 and RoseTTAFold require more resources for better performance, but they exhibit promise for predicting loops longer than 16 and 30 residues in the CASP and General datasets. These observations can provide valuable insights for selecting suitable methods for specific loop modeling tasks and contribute to future advancements in the field.
Collapse
Affiliation(s)
- Tianyue Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Langcheng Wang
- Department of Pathology, New York University Medical Center, 550 First Avenue, New York, NY 10016, USA
| | - Xujun Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chao Shen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Odin Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jike Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jialu Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Ruofan Jin
- College of Life Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Donghao Zhou
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, Guangdong, China
| | - Shicheng Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Shenzhen 518129, Guangdong, China
| | - Xiaorui Wang
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macao, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Guangyong Chen
- Zhejiang Lab, Zhejiang University, Hangzhou 311121, Zhejiang, China
| | - Peichen Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
11
|
Casier R, Duhamel J. Appraisal of blob-Based Approaches in the Prediction of Protein Folding Times. J Phys Chem B 2023; 127:8852-8859. [PMID: 37793094 DOI: 10.1021/acs.jpcb.3c04958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/06/2023]
Abstract
A series of reports published in the last 3 years has illustrated that a blob-based model (BBM) can predict the folding time of proteins from their primary amino acid (aa) sequence based on three simple rules established to characterize the long-range backbone dynamics (LRBD) of racemic polypeptides. The sole use of LRBD to predict protein folding times with the BBM represents a radical departure from all other prediction methods currently applied to determine protein folding times, which rely instead on parameters such as the structure content, folding kinetics, chain length, amino acid properties, or contact topography of proteins. Furthermore, the built-in modularity of the BBM enables the parametrization and inclusion of new phenomena affecting the LRBD of polypeptides, while its conceptual simplicity makes it an interesting new mathematical tool for studying protein folding. However, its novelty implies that its relationship with many other methods used to predict protein folding times has not been well researched. Consequently, the purpose of this report is to uncover the physical phenomena encountered during protein folding that are best described by the BBM through the identification of parameters that have been recognized over the years as being strong predictors for protein folding, such as protein size, topology, structural class, and folding kinetics. This was accomplished by determining the parameters most strongly correlated with the folding times predicted by the BBM. While the BBM in its present form appears to be a good indicator of the folding times of the vast majority of the 195 proteins considered so far, this report finds that it excels for moderately large proteins that are primarily composed of locally formed structural motifs such as α-helices or for proteins that fold in multiple steps. Altogether, these observations based on the use of the BBM support the notion that proteins fold the way they do because the LRBD of polypeptides is mostly driven by the local interactions experienced between aa's within reach of one another.
Collapse
Affiliation(s)
- Remi Casier
- Institute for Polymer Research, Waterloo Institute for Nanotechnology, Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L3G1, Canada
| | - Jean Duhamel
- Institute for Polymer Research, Waterloo Institute for Nanotechnology, Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L3G1, Canada
| |
Collapse
|
12
|
Xie L, Xie L. Elucidation of genome-wide understudied proteins targeted by PROTAC-induced degradation using interpretable machine learning. PLoS Comput Biol 2023; 19:e1010974. [PMID: 37590332 PMCID: PMC10464998 DOI: 10.1371/journal.pcbi.1010974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 08/29/2023] [Accepted: 07/27/2023] [Indexed: 08/19/2023] Open
Abstract
Proteolysis-targeting chimeras (PROTACs) are hetero-bifunctional molecules that induce the degradation of target proteins by recruiting an E3 ligase. PROTACs have the potential to inactivate disease-related genes that are considered undruggable by small molecules, making them a promising therapy for the treatment of incurable diseases. However, only a few hundred proteins have been experimentally tested for their amenability to PROTACs, and it remains unclear which other proteins in the entire human genome can be targeted by PROTACs. In this study, we have developed PrePROTAC, an interpretable machine learning model based on a transformer-based protein sequence descriptor and random forest classification. PrePROTAC predicts genome-wide targets that can be degraded by CRBN, one of the E3 ligases. In the benchmark studies, PrePROTAC achieved a ROC-AUC of 0.81, an average precision of 0.84, and over 40% sensitivity at a false positive rate of 0.05. When evaluated by an external test set which comprised proteins from different structural folds than those in the training set, the performance of PrePROTAC did not drop significantly, indicating its generalizability. Furthermore, we developed an embedding SHapley Additive exPlanations (eSHAP) method, which extends conventional SHAP analysis for original features to an embedding space through in silico mutagenesis. This method allowed us to identify key residues in the protein structure that play critical roles in PROTAC activity. The identified key residues were consistent with existing knowledge. Using PrePROTAC, we identified over 600 novel understudied proteins that are potentially degradable by CRBN and proposed PROTAC compounds for three novel drug targets associated with Alzheimer's disease.
Collapse
Affiliation(s)
- Li Xie
- Department of Computer Science, Hunter College, The City University of New York, New York City, New York, United States of America
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York City, New York, United States of America
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York City, New York, United States of America
- Helen and Robert Appel Alzheimer’s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York City, New York, United States of America
| |
Collapse
|
13
|
Gracy J, Vallejos-Sanchez K, Cohen-Gonsaud M. SecretoMyc, a web-based database on mycobacteria secreted proteins and structure-based homology identification using bio-informatics tools. Tuberculosis (Edinb) 2023; 141:102375. [PMID: 37429152 DOI: 10.1016/j.tube.2023.102375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/04/2023] [Accepted: 07/05/2023] [Indexed: 07/12/2023]
Abstract
To better understand the interaction between the host and the Mycobacterium tuberculosis pathogen, it is critical to identify its potential secreted proteins. While various experimental methods have been successful in identifying proteins under specific culture conditions, they have not provided a comprehensive characterisation of the secreted proteome. We utilized a combination of bioinformatics servers and in-house software to identify all potentially secreted proteins from six mycobacterial genomes through the three secretion systems: SEC, TAT, and T7SS. The results are presented in a database that can be crossed referenced to selected proteomics and transcriptomics studies (https://secretomyc.cbs.cnrs.fr). In addition, thanks to the recent availability of Alphafold models, we developed a tool in order to identify the structural homologues among the mycobacterial genomes.
Collapse
Affiliation(s)
- Jérôme Gracy
- Centre de Biologie Structurale, CNRS, INSERM, Université de Montpellier, France
| | - Katherine Vallejos-Sanchez
- Centre de Biologie Structurale, CNRS, INSERM, Université de Montpellier, France; Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Martin Cohen-Gonsaud
- Centre de Biologie Structurale, CNRS, INSERM, Université de Montpellier, France.
| |
Collapse
|
14
|
Trebesch N, Tajkhorshid E. Structure Reveals Homology in Elevator Transporters. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.14.544989. [PMID: 37398459 PMCID: PMC10312693 DOI: 10.1101/2023.06.14.544989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
The elevator transport mechanism is one of the handful of canonical mechanisms by which transporters shuttle their substrates across the semi-permeable membranes that surround cells and organelles. Studies of molecular function are naturally guided by evolutionary context, but until now this context has been limited for elevator transporters because established evolutionary classification methods have organized them into several apparently unrelated families. Through comprehensive examination of the pertinent structures available in the Protein Data Bank, we show that 62 elevator transporters from 18 families share a conserved architecture in their transport domains consisting of 10 helices connected in 8 topologies. Through quantitative analysis of the structural similarity, structural complexity, and topologically-corrected sequence similarity among the transport domains, we provide compelling evidence that these elevator transporters are all homologous. Using our analysis, we have constructed a phylogenetic tree to enable quantification and visualization of the evolutionary relationships among elevator transporters and their families. We also report several examples of functional features that are shared by elevator transporters from different families. Our findings shed new light on the elevator transport mechanism and allow us to understand it in a far deeper and more nuanced manner.
Collapse
Affiliation(s)
- Noah Trebesch
- Theoretical and Computational Biophysics Group, NIH Resource for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign
| | - Emad Tajkhorshid
- Theoretical and Computational Biophysics Group, NIH Resource for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign
| |
Collapse
|
15
|
Elena-Real CA, Urbanek A, Lund XL, Morató A, Sagar A, Fournet A, Estaña A, Bellande T, Allemand F, Cortés J, Sibille N, Melki R, Bernadó P. Multi-site-specific isotopic labeling accelerates high-resolution structural investigations of pathogenic huntingtin exon-1. Structure 2023:S0969-2126(23)00126-0. [PMID: 37119819 DOI: 10.1016/j.str.2023.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 03/10/2023] [Accepted: 04/04/2023] [Indexed: 05/01/2023]
Abstract
Huntington's disease neurodegeneration occurs when the number of consecutive glutamines in the huntingtin exon-1 (HTTExon1) exceeds a pathological threshold of 35. The sequence homogeneity of HTTExon1 reduces the signal dispersion in NMR spectra, hampering its structural characterization. By simultaneously introducing three isotopically labeled glutamines in a site-specific manner in multiple concatenated samples, 18 glutamines of a pathogenic HTTExon1 with 36 glutamines were unambiguously assigned. Chemical shift analyses indicate the α-helical persistence in the homorepeat and the absence of an emerging toxic conformation around the pathological threshold. Using the same type of samples, the recognition mechanism of Hsc70 molecular chaperone has been investigated, indicating that it binds to the N17 region of HTTExon1, inducing the partial unfolding of the poly-Q. The proposed strategy facilitates high-resolution structural and functional studies in low-complexity regions.
Collapse
Affiliation(s)
- Carlos A Elena-Real
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS, 29, rue de Navacelles, 34090 Montpellier, France
| | - Annika Urbanek
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS, 29, rue de Navacelles, 34090 Montpellier, France
| | - Xamuel L Lund
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS, 29, rue de Navacelles, 34090 Montpellier, France; Institut Laue Langevin, 38000 Grenoble, France
| | - Anna Morató
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS, 29, rue de Navacelles, 34090 Montpellier, France
| | - Amin Sagar
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS, 29, rue de Navacelles, 34090 Montpellier, France
| | - Aurélie Fournet
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS, 29, rue de Navacelles, 34090 Montpellier, France
| | - Alejandro Estaña
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS, 29, rue de Navacelles, 34090 Montpellier, France; LAAS-CNRS, Université de Toulouse, CNRS, 31400, Toulouse, France
| | - Tracy Bellande
- Institut François Jacob, Molecular Imaging Center (MIRCen), Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA) and Laboratory of Neurodegenerative Diseases, Centre National de la Recherche Scientifique (CNRS), Université Paris-Saclay, CEA-Fontenay-aux-Roses Bâtiment 61, 18, route du Panorama, 92265 Fontenay-aux-Rses cedex, France
| | - Frédéric Allemand
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS, 29, rue de Navacelles, 34090 Montpellier, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, 31400, Toulouse, France
| | - Nathalie Sibille
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS, 29, rue de Navacelles, 34090 Montpellier, France
| | - Ronald Melki
- Institut François Jacob, Molecular Imaging Center (MIRCen), Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA) and Laboratory of Neurodegenerative Diseases, Centre National de la Recherche Scientifique (CNRS), Université Paris-Saclay, CEA-Fontenay-aux-Roses Bâtiment 61, 18, route du Panorama, 92265 Fontenay-aux-Rses cedex, France
| | - Pau Bernadó
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS, 29, rue de Navacelles, 34090 Montpellier, France.
| |
Collapse
|
16
|
Rozano L, Mukuka YM, Hane JK, Mancera RL. Ab Initio Modelling of the Structure of ToxA-like and MAX Fungal Effector Proteins. Int J Mol Sci 2023; 24:ijms24076262. [PMID: 37047233 PMCID: PMC10094246 DOI: 10.3390/ijms24076262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/09/2023] [Accepted: 03/21/2023] [Indexed: 03/29/2023] Open
Abstract
Pathogenic fungal diseases in crops are mediated by the release of effector proteins that facilitate infection. Characterising the structure of these fungal effectors is vital to understanding their virulence mechanisms and interactions with their hosts, which is crucial in the breeding of plant cultivars for disease resistance. Several effectors have been identified and validated experimentally; however, their lack of sequence conservation often impedes the identification and prediction of their structure using sequence similarity approaches. Structural similarity has, nonetheless, been observed within fungal effector protein families, creating interest in validating the use of computational methods to predict their tertiary structure from their sequence. We used Rosetta ab initio modelling to predict the structures of members of the ToxA-like and MAX effector families for which experimental structures are known to validate this method. An optimised approach was then used to predict the structures of phenotypically validated effectors lacking known structures. Rosetta was found to successfully predict the structure of fungal effectors in the ToxA-like and MAX families, as well as phenotypically validated but structurally unconfirmed effector sequences. Interestingly, potential new effector structural families were identified on the basis of comparisons with structural homologues and the identification of associated protein domains.
Collapse
|
17
|
Elena-Real CA, Sagar A, Urbanek A, Popovic M, Morató A, Estaña A, Fournet A, Doucet C, Lund XL, Shi ZD, Costa L, Thureau A, Allemand F, Swenson RE, Milhiet PE, Crehuet R, Barducci A, Cortés J, Sinnaeve D, Sibille N, Bernadó P. The structure of pathogenic huntingtin exon 1 defines the bases of its aggregation propensity. Nat Struct Mol Biol 2023; 30:309-320. [PMID: 36864173 DOI: 10.1038/s41594-023-00920-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 01/05/2023] [Indexed: 03/04/2023]
Abstract
Huntington's disease is a neurodegenerative disorder caused by a CAG expansion in the first exon of the HTT gene, resulting in an extended polyglutamine (poly-Q) tract in huntingtin (httex1). The structural changes occurring to the poly-Q when increasing its length remain poorly understood due to its intrinsic flexibility and the strong compositional bias. The systematic application of site-specific isotopic labeling has enabled residue-specific NMR investigations of the poly-Q tract of pathogenic httex1 variants with 46 and 66 consecutive glutamines. Integrative data analysis reveals that the poly-Q tract adopts long α-helical conformations propagated and stabilized by glutamine side chain to backbone hydrogen bonds. We show that α-helical stability is a stronger signature in defining aggregation kinetics and the structure of the resulting fibrils than the number of glutamines. Our observations provide a structural perspective of the pathogenicity of expanded httex1 and pave the way to a deeper understanding of poly-Q-related diseases.
Collapse
Affiliation(s)
- Carlos A Elena-Real
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France
| | - Amin Sagar
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France
| | - Annika Urbanek
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France
| | - Matija Popovic
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France
| | - Anna Morató
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France
| | - Alejandro Estaña
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France
- LAAS-CNRS, University of Toulouse, CNRS, Toulouse, France
| | - Aurélie Fournet
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France
| | - Christine Doucet
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France
| | - Xamuel L Lund
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France
- Institute of Laue Langevin, Grenoble, France
| | - Zhen-Dan Shi
- The Chemistry and Synthesis Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Rockville, MD, USA
| | - Luca Costa
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France
| | | | - Frédéric Allemand
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France
| | - Rolf E Swenson
- The Chemistry and Synthesis Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Rockville, MD, USA
| | | | - Ramon Crehuet
- Institute for Advanced Chemistry of Catalonia (IQAC), CSIC, Barcelona, Spain
| | - Alessandro Barducci
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France
| | - Juan Cortés
- LAAS-CNRS, University of Toulouse, CNRS, Toulouse, France
| | - Davy Sinnaeve
- Univ. Lille, INSERM, CHU Lille, Institut Pasteur de Lille, U1167 - RID-AGE - Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France
- CNRS, EMR9002, Integrative Structural Biology, Lille, France
| | - Nathalie Sibille
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France
| | - Pau Bernadó
- Centre for Structural Biology, University of Montpellier, INSERM, CNRS, Montpellier, France.
| |
Collapse
|
18
|
Xie L, Xie L. Elucidation of Genome-wide Understudied Proteins targeted by PROTAC-induced degradation using Interpretable Machine Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.23.529828. [PMID: 36865212 PMCID: PMC9980153 DOI: 10.1101/2023.02.23.529828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]
Abstract
Proteolysis-targeting chimeras (PROTACs) are hetero-bifunctional molecules. They induce the degradation of a target protein by recruiting an E3 ligase to the target. The PROTAC can inactivate disease-related genes that are considered as understudied, thus has a great potential to be a new type of therapy for the treatment of incurable diseases. However, only hundreds of proteins have been experimentally tested if they are amenable to the PROTACs. It remains elusive what other proteins can be targeted by the PROTAC in the entire human genome. For the first time, we have developed an interpretable machine learning model PrePROTAC, which is based on a transformer-based protein sequence descriptor and random forest classification to predict genome-wide PROTAC-induced targets degradable by CRBN, one of the E3 ligases. In the benchmark studies, PrePROTAC achieved ROC-AUC of 0.81, PR-AUC of 0.84, and over 40% sensitivity at a false positive rate of 0.05, respectively. Furthermore, we developed an embedding SHapley Additive exPlanations (eSHAP) method to identify positions in the protein structure, which play key roles in the PROTAC activity. The key residues identified were consistent with our existing knowledge. We applied PrePROTAC to identify more than 600 novel understudied proteins that are potentially degradable by CRBN, and proposed PROTAC compounds for three novel drug targets associated with Alzheimer's disease. Author Summary Many human diseases remain incurable because disease-causing genes cannot by selectively and effectively targeted by small molecules. Proteolysis-targeting chimera (PROTAC), an organic compound that binds to both a target and a degradation-mediating E3 ligase, has emerged as a promising approach to selectively target disease-driving genes that are not druggable by small molecules. Nevertheless, not all of proteins can be accommodated by E3 ligases, and be effectively degraded. Knowledge on the degradability of a protein will be crucial for the design of PROTACs. However, only hundreds of proteins have been experimentally tested if they are amenable to the PROTACs. It remains elusive what other proteins can be targeted by the PROTAC in the entire human genome. In this paper, we propose an intepretable machine learning model PrePROTAC that takes advantage of powerful protein language modeling. PrePROTAC achieves high accuracy when evaluated by an external dataset which comes from different gene families from the proteins in the training data, suggesting the generalizability of PrePROTAC. We apply PrePROTAC to the human genome, and identify more than 600 understudied proteins that are potentially responsive to the PROTAC. Furthermore, we design three PROTAC compounds for novel drug targets associated with Alzheimer's disease.
Collapse
Affiliation(s)
- Li Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016, USA
- Helen and Robert Appel Alzheimer’s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, 10021, USA
| |
Collapse
|
19
|
AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun Biol 2023; 6:160. [PMID: 36755055 PMCID: PMC9908985 DOI: 10.1038/s42003-023-04488-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open
Abstract
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
Collapse
|
20
|
Busch MR, Rajendran C, Sterner R. Structural and Functional Characterization of the Ureidoacrylate Amidohydrolase RutB from Escherichia coli. Biochemistry 2023; 62:863-872. [PMID: 36599150 DOI: 10.1021/acs.biochem.2c00640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
We present a detailed structure-function analysis of the ureidoacrylate amidohydrolase RutB from Eschericha coli, which is an essential enzyme of the Rut pathway for pyrimidine utilization. Crystals of selenomethionine-labeled RutB were produced, which allowed us to determine the first structure of the enzyme at a resolution of 1.9 Å and to identify it as a new member of the isochorismatase-like hydrolase family. RutB was co-crystallized with the substrate analogue ureidopropionate, revealing the mode of substrate binding. Mutation of residues constituting the catalytic triad (D24A, D24N, K133A, C166A, C166S, C166T, C166Y) resulted in complete inactivation of RutB, whereas mutation of other residues close to the active site (Y29F, Y35F, N72A, W74A, W74F, E80A, E80D, S92A, S92T, S92Y, Q105A, Y136A, Y136F) leads to distinct changes of the turnover number (kcat) and/or the Michaelis constant (KM). The results of our structural and mutational studies allowed us to assign specific functions to individual residues and to formulate a plausible reaction mechanism for RutB.
Collapse
Affiliation(s)
- Markus R Busch
- Institute of Biophysics and Physical Biochemistry, Regensburg Center for Biochemistry, University of Regensburg, D-93040 Regensburg, Germany
| | - Chitra Rajendran
- Institute of Biophysics and Physical Biochemistry, Regensburg Center for Biochemistry, University of Regensburg, D-93040 Regensburg, Germany
| | - Reinhard Sterner
- Institute of Biophysics and Physical Biochemistry, Regensburg Center for Biochemistry, University of Regensburg, D-93040 Regensburg, Germany
| |
Collapse
|
21
|
Felline A, Gentile S, Fanelli F. psnGPCRdb: The Structure-network Database of G Protein Coupled Receptors. J Mol Biol 2023:167950. [PMID: 36646374 DOI: 10.1016/j.jmb.2023.167950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 01/02/2023] [Accepted: 01/03/2023] [Indexed: 01/15/2023]
Abstract
G protein coupled receptors (GPCRs) are critical eukaryotic signal transduction gatekeepers and represent the largest protein superfamily in the human proteome, with more than 800 members. They share seven transmembrane helices organized in an up-down bundle architecture. GPCR-mediated signaling pathways have been linked to numerous human diseases, and GPCRs are the targets of approximately 35% of all drugs currently on the market. Structure network analysis, a graph theory-based approach, represents a cutting-edge tool to deeply understand GPCR function, which strongly relies on communication between the extracellular and intracellular poles of their structure. psnGPCRdb stores the structure networks (i.e., linked nodes, hubs, communities and communication pathways) computed on all updated GPCR structures in the Protein Data Bank, in their isolated states or in complex with extracellular and/or intracellular molecules. The structure communication signatures of a sub-family or family of GPCRs as well as of their small-molecule activators or inhibitors are stored as consensus networks. The database stores also all meaningful structure network-based comparisons (i.e., difference networks) of functionally different states (i.e., inactive or active) of a given receptor sub-type, or of consensus networks representative of a receptor sub-type, type, sub-family or family. Single or consensus GPCR networks hold also information on amino acid conservation. The database allows to graphically analyze 3D structure networks together with interactive data-tables. Ligand-centric networks can be analyzed as well. psnGPCRdb is unique and represents a powerful resource to unravel GPCR function with important implications in cell signaling and drug design. psnGPCRdb is freely available at: http://webpsn.hpc.unimo.it/psngpcr.php.
Collapse
Affiliation(s)
- Angelo Felline
- Department of Life Sciences, University of Modena and Reggio Emilia, via Campy 103, 41125 Modena, Italy
| | - Sara Gentile
- Department of Life Sciences, University of Modena and Reggio Emilia, via Campy 103, 41125 Modena, Italy
| | - Francesca Fanelli
- Department of Life Sciences, University of Modena and Reggio Emilia, via Campy 103, 41125 Modena, Italy; Center for Neuroscience and Neurotechnology, University of Modena and Reggio Emilia, via Campi 287, 41125 Modena, Italy.
| |
Collapse
|
22
|
Burley SK, Bhikadiya C, Bi C, Bittrich S, Chao H, Chen L, Craig PA, Crichlow GV, Dalenberg K, Duarte JM, Dutta S, Fayazi M, Feng Z, Flatt JW, Ganesan S, Ghosh S, Goodsell DS, Green RK, Guranovic V, Henry J, Hudson BP, Khokhriakov I, Lawson CL, Liang Y, Lowe R, Peisach E, Persikova I, Piehl DW, Rose Y, Sali A, Segura J, Sekharan M, Shao C, Vallat B, Voigt M, Webb B, Westbrook JD, Whetstone S, Young JY, Zalevsky A, Zardecki C. RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res 2023; 51:D488-D508. [PMID: 36420884 PMCID: PMC9825554 DOI: 10.1093/nar/gkac1077] [Citation(s) in RCA: 119] [Impact Index Per Article: 119.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/17/2022] [Accepted: 11/02/2022] [Indexed: 11/27/2022] Open
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), founding member of the Worldwide Protein Data Bank (wwPDB), is the US data center for the open-access PDB archive. As wwPDB-designated Archive Keeper, RCSB PDB is also responsible for PDB data security. Annually, RCSB PDB serves >10 000 depositors of three-dimensional (3D) biostructures working on all permanently inhabited continents. RCSB PDB delivers data from its research-focused RCSB.org web portal to many millions of PDB data consumers based in virtually every United Nations-recognized country, territory, etc. This Database Issue contribution describes upgrades to the research-focused RCSB.org web portal that created a one-stop-shop for open access to ∼200 000 experimentally-determined PDB structures of biological macromolecules alongside >1 000 000 incorporated Computed Structure Models (CSMs) predicted using artificial intelligence/machine learning methods. RCSB.org is a 'living data resource.' Every PDB structure and CSM is integrated weekly with related functional annotations from external biodata resources, providing up-to-date information for the entire corpus of 3D biostructure data freely available from RCSB.org with no usage limitations. Within RCSB.org, PDB structures and the CSMs are clearly identified as to their provenance and reliability. Both are fully searchable, and can be analyzed and visualized using the full complement of RCSB.org web portal capabilities.
Collapse
Affiliation(s)
- Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Charmi Bhikadiya
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Chunxiao Bi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Sebastian Bittrich
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Henry Chao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Li Chen
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Paul A Craig
- School of Chemistry and Materials Science, Rochester Institute of Technology, Rochester, NY 14623, USA
| | - Gregg V Crichlow
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Kenneth Dalenberg
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Shuchismita Dutta
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| | - Maryam Fayazi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Zukang Feng
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Justin W Flatt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Sai Ganesan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - Sutapa Ghosh
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - David S Goodsell
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Rachel Kramer Green
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Vladimir Guranovic
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jeremy Henry
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Brian P Hudson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Igor Khokhriakov
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Catherine L Lawson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yuhe Liang
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Robert Lowe
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ezra Peisach
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Irina Persikova
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Dennis W Piehl
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Andrej Sali
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Monica Sekharan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Chenghua Shao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Brinda Vallat
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Maria Voigt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ben Webb
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - John D Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| | - Shamara Whetstone
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jasmine Y Young
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Arthur Zalevsky
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - Christine Zardecki
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
23
|
Anbo H, Ota M, Fukuchi S. Computational Methods to Predict Intrinsically Disordered Regions and Functional Regions in Them. Methods Mol Biol 2023; 2627:231-245. [PMID: 36959451 DOI: 10.1007/978-1-0716-2974-1_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Intrinsically disordered regions (IDRs) are protein regions that do not adopt fixed tertiary structures. Since these regions lack ordered three-dimensional structures, they should be excluded from the target portions of homology modeling. IDRs can be predicted from the amino acid sequences, because their amino acid compositions are different from that of the structured domains. This chapter provides a review of the prediction methods of IDRs and a case study of IDR prediction.
Collapse
Affiliation(s)
- Hiroto Anbo
- Faculty of Engineering, Maebashi Institute of Technology, Maebashi, Japan
| | - Motonori Ota
- Graduate School of Information Sciences, Nagoya University, Nagoya, Japan
| | - Satoshi Fukuchi
- Faculty of Engineering, Maebashi Institute of Technology, Maebashi, Japan.
| |
Collapse
|
24
|
Chen C, Chen X, Morehead A, Wu T, Cheng J. 3D-equivariant graph neural networks for protein model quality assessment. BIOINFORMATICS (OXFORD, ENGLAND) 2023; 39:6986970. [PMID: 36637199 PMCID: PMC10089647 DOI: 10.1093/bioinformatics/btad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 11/28/2022] [Accepted: 01/12/2023] [Indexed: 01/14/2023]
Abstract
MOTIVATION Quality assessment (QA) of predicted protein tertiary structure models plays an important role in ranking and using them. With the recent development of deep learning end-to-end protein structure prediction techniques for generating highly confident tertiary structures for most proteins, it is important to explore corresponding QA strategies to evaluate and select the structural models predicted by them since these models have better quality and different properties than the models predicted by traditional tertiary structure prediction methods. RESULTS We develop EnQA, a novel graph-based 3D-equivariant neural network method that is equivariant to rotation and translation of 3D objects to estimate the accuracy of protein structural models by leveraging the structural features acquired from the state-of-the-art tertiary structure prediction method-AlphaFold2. We train and test the method on both traditional model datasets (e.g. the datasets of the Critical Assessment of Techniques for Protein Structure Prediction) and a new dataset of high-quality structural models predicted only by AlphaFold2 for the proteins whose experimental structures were released recently. Our approach achieves state-of-the-art performance on protein structural models predicted by both traditional protein structure prediction methods and the latest end-to-end deep learning method-AlphaFold2. It performs even better than the model QA scores provided by AlphaFold2 itself. The results illustrate that the 3D-equivariant graph neural network is a promising approach to the evaluation of protein structural models. Integrating AlphaFold2 features with other complementary sequence and structural features is important for improving protein model QA. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/BioinfoMachineLearning/EnQA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chen Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Alex Morehead
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
25
|
Tolkatchev D, Smith GE, Kostyukova AS. Nuclear Magnetic Resonance-Guided Structural Analysis of Moderate-Affinity Protein Complexes with Intrinsically Disordered Polypeptides. Methods Mol Biol 2023; 2652:405-437. [PMID: 37093489 DOI: 10.1007/978-1-0716-3147-8_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Binding affinity of an individual binding site of an intrinsically disordered protein for its folded partner may be moderate. In such cases, a straightforward determination of the structure of the binding interface is difficult. We offer a hybrid protocol combining NMR chemical shift information, NMR spectral data on amino acid residue sequence substitution effects, residual dipolar coupling, and molecular dynamics simulation that allowed us to determine the structure of a complex between the intrinsically disordered tropomyosin-binding site of leiomodin and a coiled-coil peptide modeling the N-terminal fragment of tropomyosin. The protocol can be used for other moderate-affinity complexes composed of an intrinsically disordered peptide bound to a structured protein partner.
Collapse
Affiliation(s)
- Dmitri Tolkatchev
- Voiland School of Chemical Engineering and Bioengineering, Washington State University, Pullman, WA, USA.
| | - Garry E Smith
- Voiland School of Chemical Engineering and Bioengineering, Washington State University, Pullman, WA, USA
| | - Alla S Kostyukova
- Voiland School of Chemical Engineering and Bioengineering, Washington State University, Pullman, WA, USA
| |
Collapse
|
26
|
Burley SK, Bhikadiya C, Bi C, Bittrich S, Chao H, Chen L, Craig PA, Crichlow GV, Dalenberg K, Duarte JM, Dutta S, Fayazi M, Feng Z, Flatt JW, Ganesan SJ, Ghosh S, Goodsell DS, Green RK, Guranovic V, Henry J, Hudson BP, Khokhriakov I, Lawson CL, Liang Y, Lowe R, Peisach E, Persikova I, Piehl DW, Rose Y, Sali A, Segura J, Sekharan M, Shao C, Vallat B, Voigt M, Webb B, Westbrook JD, Whetstone S, Young JY, Zalevsky A, Zardecki C. RCSB Protein Data bank: Tools for visualizing and understanding biological macromolecules in 3D. Protein Sci 2022; 31:e4482. [PMID: 36281733 PMCID: PMC9667899 DOI: 10.1002/pro.4482] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 10/17/2022] [Accepted: 10/19/2022] [Indexed: 12/14/2022]
Abstract
Now in its 52nd year of continuous operations, the Protein Data Bank (PDB) is the premiere open-access global archive housing three-dimensional (3D) biomolecular structure data. It is jointly managed by the Worldwide Protein Data Bank (wwPDB) partnership. The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) is funded by the National Science Foundation, National Institutes of Health, and US Department of Energy and serves as the US data center for the wwPDB. RCSB PDB is also responsible for the security of PDB data in its role as wwPDB-designated Archive Keeper. Every year, RCSB PDB serves tens of thousands of depositors of 3D macromolecular structure data (coming from macromolecular crystallography, nuclear magnetic resonance spectroscopy, electron microscopy, and micro-electron diffraction). The RCSB PDB research-focused web portal (RCSB.org) makes PDB data available at no charge and without usage restrictions to many millions of PDB data consumers around the world. The RCSB PDB training, outreach, and education web portal (PDB101.RCSB.org) serves nearly 700 K educators, students, and members of the public worldwide. This invited Tools Issue contribution describes how RCSB PDB (i) is organized; (ii) works with wwPDB partners to process new depositions; (iii) serves as the wwPDB-designated Archive Keeper; (iv) enables exploration and 3D visualization of PDB data via RCSB.org; and (v) supports training, outreach, and education via PDB101.RCSB.org. New tools and features at RCSB.org are presented using examples drawn from high-resolution structural studies of proteins relevant to treatment of human cancers by targeting immune checkpoints.
Collapse
Affiliation(s)
- Stephen K. Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Cancer Institute of New Jersey, Rutgers, The State University of New JerseyNew BrunswickNew JerseyUSA,Research Collaboratory for Structural Bioinformatics Protein Data BankSan Diego Supercomputer Center, University of CaliforniaLa JollaCaliforniaUSA,Department of Chemistry and Chemical Biology, RutgersThe State University of New JerseyPiscatawayNew JerseyUSA
| | - Charmi Bhikadiya
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Chunxiao Bi
- Research Collaboratory for Structural Bioinformatics Protein Data BankSan Diego Supercomputer Center, University of CaliforniaLa JollaCaliforniaUSA
| | - Sebastian Bittrich
- Research Collaboratory for Structural Bioinformatics Protein Data BankSan Diego Supercomputer Center, University of CaliforniaLa JollaCaliforniaUSA
| | - Henry Chao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Li Chen
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Paul A. Craig
- School of Chemistry and Materials ScienceRochester Institute of TechnologyRochesterNew YorkUSA
| | - Gregg V. Crichlow
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Kenneth Dalenberg
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Jose M. Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data BankSan Diego Supercomputer Center, University of CaliforniaLa JollaCaliforniaUSA
| | - Shuchismita Dutta
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Cancer Institute of New Jersey, Rutgers, The State University of New JerseyNew BrunswickNew JerseyUSA
| | - Maryam Fayazi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Zukang Feng
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Justin W. Flatt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Sai J. Ganesan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic SciencesQuantitative Biosciences Institute, University of CaliforniaSan FranciscoCaliforniaUSA,Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Pharmaceutical ChemistryQuantitative Biosciences Institute, University of CaliforniaSan FranciscoCaliforniaUSA
| | - Sutapa Ghosh
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - David S. Goodsell
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Cancer Institute of New Jersey, Rutgers, The State University of New JerseyNew BrunswickNew JerseyUSA,Department of Integrative Structural and Computational BiologyThe Scripps Research InstituteLa JollaCaliforniaUSA
| | - Rachel Kramer Green
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Vladimir Guranovic
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Jeremy Henry
- Research Collaboratory for Structural Bioinformatics Protein Data BankSan Diego Supercomputer Center, University of CaliforniaLa JollaCaliforniaUSA
| | - Brian P. Hudson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Igor Khokhriakov
- Research Collaboratory for Structural Bioinformatics Protein Data BankSan Diego Supercomputer Center, University of CaliforniaLa JollaCaliforniaUSA
| | - Catherine L. Lawson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Yuhe Liang
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Robert Lowe
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Ezra Peisach
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Irina Persikova
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Dennis W. Piehl
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data BankSan Diego Supercomputer Center, University of CaliforniaLa JollaCaliforniaUSA
| | - Andrej Sali
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic SciencesQuantitative Biosciences Institute, University of CaliforniaSan FranciscoCaliforniaUSA,Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Pharmaceutical ChemistryQuantitative Biosciences Institute, University of CaliforniaSan FranciscoCaliforniaUSA
| | - Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data BankSan Diego Supercomputer Center, University of CaliforniaLa JollaCaliforniaUSA
| | - Monica Sekharan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Chenghua Shao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Brinda Vallat
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Maria Voigt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Benjamin Webb
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic SciencesQuantitative Biosciences Institute, University of CaliforniaSan FranciscoCaliforniaUSA,Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Pharmaceutical ChemistryQuantitative Biosciences Institute, University of CaliforniaSan FranciscoCaliforniaUSA
| | - John D. Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Shamara Whetstone
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Jasmine Y. Young
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Arthur Zalevsky
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic SciencesQuantitative Biosciences Institute, University of CaliforniaSan FranciscoCaliforniaUSA,Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Pharmaceutical ChemistryQuantitative Biosciences Institute, University of CaliforniaSan FranciscoCaliforniaUSA
| | - Christine Zardecki
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA,Institute for Quantitative Biomedicine, Rutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| |
Collapse
|
27
|
Given FM, Moran F, Johns AS, Titterington JA, Allison TM, Crittenden DL, Johnston JM. The structure of His-tagged Geobacillus stearothermophilus purine nucleoside phosphorylase reveals a `spanner in the works'. Acta Crystallogr F Struct Biol Commun 2022; 78:416-422. [PMID: 36458621 PMCID: PMC9716568 DOI: 10.1107/s2053230x22011025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 11/16/2022] [Indexed: 11/29/2022] Open
Abstract
The 1.72 Å resolution structure of purine nucleoside phosphorylase from Geobacillus stearothermophilus, a thermostable protein of potential interest for the biocatalytic synthesis of antiviral nucleoside compounds, is reported. The structure of the N-terminally His-tagged enzyme is a hexamer, as is typical of bacterial homologues, with a trimer-of-dimers arrangement. Unexpectedly, several residues of the recombinant tobacco etch virus protease (rTEV) cleavage site from the N-terminal tag are located in the active site of the neighbouring subunit in the dimer. Key to this interaction is a tyrosine residue, which sits where the nucleoside ring of the substrate would normally be located. Tag binding appears to be driven by a combination of enthalpic, entropic and proximity effects, which convey a particularly high affinity in the crystallized form. Attempts to cleave the tag in solution yielded only a small fraction of untagged protein, suggesting that the enzyme predominantly exists in the tag-bound form in solution, preventing rTEV from accessing the cleavage site. However, the tagged protein retained some activity in solution, suggesting that the tag does not completely block the active site, but may act as a competitive inhibitor. This serves as a warning that it is prudent to establish how affinity tags may affect protein structure and function, especially for industrial biocatalytic applications that rely on the efficiency and convenience of one-pot purifications and in cases where tag removal is difficult.
Collapse
Affiliation(s)
- Fiona M. Given
- School of Physical and Chemical Sciences, Biomolecular Interaction Centre, University of Canterbury, New Zealand
| | - Fuchsia Moran
- School of Physical and Chemical Sciences, Biomolecular Interaction Centre, University of Canterbury, New Zealand
| | - Ashleigh S. Johns
- School of Physical and Chemical Sciences, Biomolecular Interaction Centre, University of Canterbury, New Zealand
| | - James A. Titterington
- School of Physical and Chemical Sciences, Biomolecular Interaction Centre, University of Canterbury, New Zealand
| | - Timothy M. Allison
- School of Physical and Chemical Sciences, Biomolecular Interaction Centre, University of Canterbury, New Zealand
| | - Deborah L. Crittenden
- School of Physical and Chemical Sciences, Biomolecular Interaction Centre, University of Canterbury, New Zealand
| | - Jodie M. Johnston
- School of Physical and Chemical Sciences, Biomolecular Interaction Centre, University of Canterbury, New Zealand,Correspondence e-mail:
| |
Collapse
|
28
|
Yan B, Ran X, Gollu A, Cheng Z, Zhou X, Chen Y, Yang ZJ. IntEnzyDB: an Integrated Structure-Kinetics Enzymology Database. J Chem Inf Model 2022; 62:5841-5848. [PMID: 36286319 DOI: 10.1021/acs.jcim.2c01139] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Data-driven modeling has emerged as a new paradigm for biocatalyst design and discovery. Biocatalytic databases that integrate enzyme structure and function data are in urgent need. Here we describe IntEnzyDB as an integrated structure-kinetics database for facile statistical modeling and machine learning. IntEnzyDB employs a relational database architecture with a flattened data structure, which allows rapid data operation. This architecture also makes it easy for IntEnzyDB to incorporate more types of enzyme function data. IntEnzyDB contains enzyme kinetics and structure data from six enzyme commission classes. Using 1050 enzyme structure-kinetics pairs, we investigated the efficiency-perturbing propensities of mutations that are close or distal to the active site. The statistical results show that efficiency-enhancing mutations are globally encoded and that deleterious mutations are much more likely to occur in close mutations than in distal mutations. Finally, we describe a web interface that allows public users to access enzymology data stored in IntEnzyDB. IntEnzyDB will provide a computational facility for data-driven modeling in biocatalysis and molecular evolution.
Collapse
Affiliation(s)
- Bailu Yan
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States.,Department of Biostatistics, Vanderbilt University, Nashville, Tennessee 37205, United States
| | - Xinchun Ran
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Anvita Gollu
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Zihao Cheng
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Xiang Zhou
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Yiwen Chen
- Data Science Institute, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Zhongyue J Yang
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States.,Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37235, United States.,Vanderbilt Institute of Chemical Biology, Vanderbilt University, Nashville, Tennessee 37235, United States.,Data Science Institute, Vanderbilt University, Nashville, Tennessee 37235, United States.,Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37205, United States
| |
Collapse
|
29
|
Neri U, Wolf YI, Roux S, Camargo AP, Lee B, Kazlauskas D, Chen IM, Ivanova N, Zeigler Allen L, Paez-Espino D, Bryant DA, Bhaya D, Krupovic M, Dolja VV, Kyrpides NC, Koonin EV, Gophna U. Expansion of the global RNA virome reveals diverse clades of bacteriophages. Cell 2022; 185:4023-4037.e18. [PMID: 36174579 DOI: 10.1016/j.cell.2022.08.023] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 05/16/2022] [Accepted: 08/24/2022] [Indexed: 01/26/2023]
Abstract
High-throughput RNA sequencing offers broad opportunities to explore the Earth RNA virome. Mining 5,150 diverse metatranscriptomes uncovered >2.5 million RNA virus contigs. Analysis of >330,000 RNA-dependent RNA polymerases (RdRPs) shows that this expansion corresponds to a 5-fold increase of the known RNA virus diversity. Gene content analysis revealed multiple protein domains previously not found in RNA viruses and implicated in virus-host interactions. Extended RdRP phylogeny supports the monophyly of the five established phyla and reveals two putative additional bacteriophage phyla and numerous putative additional classes and orders. The dramatically expanded phylum Lenarviricota, consisting of bacterial and related eukaryotic viruses, now accounts for a third of the RNA virome. Identification of CRISPR spacer matches and bacteriolytic proteins suggests that subsets of picobirnaviruses and partitiviruses, previously associated with eukaryotes, infect prokaryotic hosts.
Collapse
Affiliation(s)
- Uri Neri
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv 6997801, Israel.
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Simon Roux
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Antonio Pedro Camargo
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Benjamin Lee
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
| | - Darius Kazlauskas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius 10257, Lithuania
| | - I Min Chen
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Natalia Ivanova
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Lisa Zeigler Allen
- Microbial and Environmental Genomics, J. Craig Venter Institute, La Jolla, CA, USA; Marine Biology Research Division, Scripps Institution of Oceanography, La Jolla, CA, USA
| | - David Paez-Espino
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Donald A Bryant
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Devaki Bhaya
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Mart Krupovic
- Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Archaeal Virology Unit, 75015 Paris, France
| | - Valerian V Dolja
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA.
| | - Nikos C Kyrpides
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Uri Gophna
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv 6997801, Israel.
| |
Collapse
|
30
|
Pak MA, Ivankov DN. Best templates outperform homology models in predicting the impact of mutations on protein stability. Bioinformatics 2022; 38:4312-4320. [PMID: 35894930 DOI: 10.1093/bioinformatics/btac515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 05/31/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Prediction of protein stability change upon mutation (ΔΔG) is crucial for facilitating protein engineering and understanding of protein folding principles. Robust prediction of protein folding free energy change requires the knowledge of protein three-dimensional (3D) structure. In case, protein 3D structure is not available, one can predict the structure from protein sequence; however, the perspectives of ΔΔG predictions for predicted protein structures are unknown. The accuracy of using 3D structures of the best templates for the ΔΔG prediction is also unclear. RESULTS To investigate these questions, we used a representative set of seven diverse and accurate publicly available tools (FoldX, Eris, Rosetta, DDGun, ACDC-NN, ThermoNet and DynaMut) for stability change prediction combined with AlphaFold or I-Tasser for protein 3D structure prediction. We found that best templates perform consistently better than (or similar to) homology models for all ΔΔG predictors. Our findings imply using the best template structure for the prediction of protein stability change upon mutation if the protein 3D structure is not available. AVAILABILITY AND IMPLEMENTATION The data are available at https://github.com/ivankovlab/template-vs-model. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marina A Pak
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| | - Dmitry N Ivankov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| |
Collapse
|
31
|
Chaudhary A, Chaurasia PK, Kushwaha S, Chauhan P, Chawade A, Mani A. Correlating multi-functional role of cold shock domain proteins with intrinsically disordered regions. Int J Biol Macromol 2022; 220:743-753. [PMID: 35987358 DOI: 10.1016/j.ijbiomac.2022.08.100] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 07/26/2022] [Accepted: 08/14/2022] [Indexed: 11/05/2022]
Abstract
Cold shock proteins (CSPs) are an ancient and conserved family of proteins. They are renowned for their role in response to low-temperature stress in bacteria and nucleic acid binding activities. In prokaryotes, cold and non-cold inducible CSPs are involved in various cellular and metabolic processes such as growth and development, osmotic oxidation, starvation, stress tolerance, and host cell invasion. In prokaryotes, cold shock condition reduces cell transcription and translation efficiency. Eukaryotic cold shock domain (CSD) proteins are evolved form of prokaryotic CSPs where CSD is flanked by N- and C-terminal domains. Eukaryotic CSPs are multi-functional proteins. CSPs also act as nucleic acid chaperons by preventing the formation of secondary structures in mRNA at low temperatures. In human, CSD proteins play a crucial role in the progression of breast cancer, colon cancer, lung cancer, and Alzheimer's disease. A well-defined three-dimensional structure of intrinsically disordered regions of CSPs family members is still undetermined. In this article, intrinsic disorder regions of CSPs have been explored systematically to understand the pleiotropic role of the cold shock family of proteins.
Collapse
Affiliation(s)
- Amit Chaudhary
- Department of Metallurgical Engineering & Materials Science, Indian Institute of Technology Bombay
| | - Pankaj Kumar Chaurasia
- PG Department of Chemistry, L.S. College, Babasaheb Bhimrao Ambedkar Bihar University, Muzaffarpur, Bihar 842001, India
| | - Sandeep Kushwaha
- National Institute of Animal Biotechnology, Hyderabad 500032, India.
| | | | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, 230 53 Alnarp, Sweden.
| | - Ashutosh Mani
- Department of Biotechnology, Motilal Nehru National Institute of Technology Allahabad, Prayagraj 211004, India.
| |
Collapse
|
32
|
Sneha S, Pandey DM. In silico structural and functional characterization of Antheraea mylitta cocoonase. J Genet Eng Biotechnol 2022; 20:102. [PMID: 35816268 PMCID: PMC9273796 DOI: 10.1186/s43141-022-00367-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 05/20/2022] [Indexed: 11/26/2022]
Abstract
Background Cocoonase is a serine protease present in sericigenous insects and majorly involved in dissolving of sericin protein allowing moth to escape. Cocoon structure is made up of sericin protein which holds fibroin filaments together. Cocoonase enzyme hydrolyzes sericin protein without harming the fibroin. However, until date, no detailed characterization of cocoonase enzyme and its presence in wild silk moth Antheraea mylitta has been carried out. Therefore, current study aimed for detailed characterization of amplified cocoonase enzyme, secondary and tertiary structure prediction, sequence and structural alignment, phylogenetic analysis, and computational validation. Several computational tools such as ProtParam, Iterative Threading Assembly Refinement (I-TASSER), PROCHECK, SAVES v6.0, TM-align, Molecular Evolutionary Genetics Analysis (MEGA) X, and Figtree were employed for characterization of cocoonase protein. Results The present study elucidates about the isolation of RNA, cDNA preparation, PCR amplification, and in silico characterization of cocoonase from Antheraea mylitta. Here, total RNA was isolated from head region of A. mylitta, and gene-specific primers were designed using Primer3 followed by PCR-based amplification and sequencing. The newly constructed 377-bp length sequence of cocoonase was subjected to in silico characterization. In silico study of A. mylitta cocoonase showed 26% similarity to A. pernyi strain Qing-6 cocoonase using Blastp and belongs to member of chymotrypsin-like serine protease superfamily. From phylogenetic study, it was found that A. mylitta cocoonase sequence is closely related to A. pernyi cocoonase sequence. Conclusions The present study revealed about the detailed in silico characterization of cocoonase gene and encoded protein obtained from A. mylitta head region. The results obtained infer the presence of cocoonase enzyme in the wild silkworm A. mylitta and can be used for cocoon degumming which will be a valuable and cost-effective strategy in silk industry. Supplementary Information The online version contains supplementary material available at 10.1186/s43141-022-00367-8.
Collapse
Affiliation(s)
- Sneha Sneha
- Department of Bioengineering and Biotechnology, Birla Institute of Technology, Mesra, Ranchi, 835215, Jharkhand, India
| | - Dev Mani Pandey
- Department of Bioengineering and Biotechnology, Birla Institute of Technology, Mesra, Ranchi, 835215, Jharkhand, India.
| |
Collapse
|
33
|
Andrade-Martínez JS, Camelo Valera LC, Chica Cárdenas LA, Forero-Junco L, López-Leal G, Moreno-Gallego JL, Rangel-Pineros G, Reyes A. Computational Tools for the Analysis of Uncultivated Phage Genomes. Microbiol Mol Biol Rev 2022; 86:e0000421. [PMID: 35311574 PMCID: PMC9199400 DOI: 10.1128/mmbr.00004-21] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Over a century of bacteriophage research has uncovered a plethora of fundamental aspects of their biology, ecology, and evolution. Furthermore, the introduction of community-level studies through metagenomics has revealed unprecedented insights on the impact that phages have on a range of ecological and physiological processes. It was not until the introduction of viral metagenomics that we began to grasp the astonishing breadth of genetic diversity encompassed by phage genomes. Novel phage genomes have been reported from a diverse range of biomes at an increasing rate, which has prompted the development of computational tools that support the multilevel characterization of these novel phages based solely on their genome sequences. The impact of these technologies has been so large that, together with MAGs (Metagenomic Assembled Genomes), we now have UViGs (Uncultivated Viral Genomes), which are now officially recognized by the International Committee for the Taxonomy of Viruses (ICTV), and new taxonomic groups can now be created based exclusively on genomic sequence information. Even though the available tools have immensely contributed to our knowledge of phage diversity and ecology, the ongoing surge in software programs makes it challenging to keep up with them and the purpose each one is designed for. Therefore, in this review, we describe a comprehensive set of currently available computational tools designed for the characterization of phage genome sequences, focusing on five specific analyses: (i) assembly and identification of phage and prophage sequences, (ii) phage genome annotation, (iii) phage taxonomic classification, (iv) phage-host interaction analysis, and (v) phage microdiversity.
Collapse
Affiliation(s)
- Juan Sebastián Andrade-Martínez
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Laura Carolina Camelo Valera
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Luis Alberto Chica Cárdenas
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Laura Forero-Junco
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- Department of Plant and Environmental Science, University of Copenhagen, Frederiksberg, Denmark
| | - Gamaliel López-Leal
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - J. Leonardo Moreno-Gallego
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Guillermo Rangel-Pineros
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Alejandro Reyes
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
34
|
Oncul AB, Celik Y, Unel NM, Baloglu MC. Bhlhdb: A next generation database of basic helix loop helix transcription factors based on deep learning model. J Bioinform Comput Biol 2022; 20:2250014. [DOI: 10.1142/s0219720022500147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
35
|
Paiva VDA, Gomes IDS, Monteiro CR, Mendonça MV, Martins PM, Santana CA, Gonçalves-Almeida V, Izidoro SC, Melo-Minardi RCD, Silveira SDA. Protein structural bioinformatics: An overview. Comput Biol Med 2022; 147:105695. [DOI: 10.1016/j.compbiomed.2022.105695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 06/01/2022] [Accepted: 06/02/2022] [Indexed: 11/27/2022]
|
36
|
Cretin G, Galochkina T, Vander Meersche Y, de Brevern AG, Postic G, Gelly JC. SWORD2: hierarchical analysis of protein 3D structures. Nucleic Acids Res 2022; 50:W732-W738. [PMID: 35580056 PMCID: PMC9252838 DOI: 10.1093/nar/gkac370] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 04/19/2022] [Accepted: 04/29/2022] [Indexed: 11/27/2022] Open
Abstract
Understanding the functions and origins of proteins requires splitting these macromolecules into fragments that could be independent in terms of folding, activity, or evolution. For that purpose, structural domains are the typical level of analysis, but shorter segments, such as subdomains and supersecondary structures, are insightful as well. Here, we propose SWORD2, a web server for exploring how an input protein structure may be decomposed into ‘Protein Units’ that can be hierarchically assembled to delimit structural domains. For each partitioning solution, the relevance of the identified substructures is estimated through different measures. This multilevel analysis is achieved by integrating our previous work on domain delineation, ‘protein peeling’ and model quality assessment. We hope that SWORD2 will be useful to biologists searching for key regions in their proteins of interest and to bioinformaticians building datasets of protein structures. The web server is freely available online: https://www.dsimb.inserm.fr/SWORD2.
Collapse
Affiliation(s)
- Gabriel Cretin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France.,Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France.,Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Yann Vander Meersche
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France.,Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Alexandre G de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France.,Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Guillaume Postic
- Université Paris-Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France.,Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| |
Collapse
|
37
|
Porta JC, Han B, Gulsevin A, Chung JM, Peskova Y, Connolly S, Mchaourab HS, Meiler J, Karakas E, Kenworthy AK, Ohi MD. Molecular architecture of the human caveolin-1 complex. SCIENCE ADVANCES 2022; 8:eabn7232. [PMID: 35544577 PMCID: PMC9094659 DOI: 10.1126/sciadv.abn7232] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Membrane-sculpting proteins shape the morphology of cell membranes and facilitate remodeling in response to physiological and environmental cues. Complexes of the monotopic membrane protein caveolin function as essential curvature-generating components of caveolae, flask-shaped invaginations that sense and respond to plasma membrane tension. However, the structural basis for caveolin's membrane remodeling activity is currently unknown. Here, we show that, using cryo-electron microscopy, the human caveolin-1 complex is composed of 11 protomers organized into a tightly packed disc with a flat membrane-embedded surface. The structural insights suggest a previously unrecognized mechanism for how membrane-sculpting proteins interact with membranes and reveal how key regions of caveolin-1, including its scaffolding, oligomerization, and intramembrane domains, contribute to its function.
Collapse
Affiliation(s)
- Jason C. Porta
- Life Sciences Institute, University of Michigan, Ann Arbor, MI, USA
| | - Bing Han
- Center for Membrane and Cell Physiology, University of Virginia, Charlottesville, VA, USA
- Department of Molecular Physiology and Biological Physics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Alican Gulsevin
- Department of Chemistry, Vanderbilt University Nashville, TN, USA
| | - Jeong Min Chung
- Life Sciences Institute, University of Michigan, Ann Arbor, MI, USA
- Department of Biotechnology, The Catholic University of Korea, Bucheon, Republic of Korea
| | - Yelena Peskova
- Center for Membrane and Cell Physiology, University of Virginia, Charlottesville, VA, USA
- Department of Molecular Physiology and Biological Physics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Sarah Connolly
- Life Sciences Institute, University of Michigan, Ann Arbor, MI, USA
| | - Hassane S. Mchaourab
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University Nashville, TN, USA
- Institute for Drug Discovery, Leipzig University, Germany
| | - Erkan Karakas
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA
- Corresponding author. (E.K.); (A.K.K.); (M.D.O.)
| | - Anne K. Kenworthy
- Center for Membrane and Cell Physiology, University of Virginia, Charlottesville, VA, USA
- Department of Molecular Physiology and Biological Physics, University of Virginia School of Medicine, Charlottesville, VA, USA
- Corresponding author. (E.K.); (A.K.K.); (M.D.O.)
| | - Melanie D. Ohi
- Life Sciences Institute, University of Michigan, Ann Arbor, MI, USA
- Department of Cell and Developmental Biology, University of Michigan School of Medicine, Ann Arbor, MI, USA
- Corresponding author. (E.K.); (A.K.K.); (M.D.O.)
| |
Collapse
|
38
|
Kumar A, Khade PM, Dorman KS, Jernigan RL. Coarse-graining protein structures into their dynamic communities with DCI, a dynamic community identifier. Bioinformatics 2022; 38:2727-2733. [PMID: 35561187 PMCID: PMC9113273 DOI: 10.1093/bioinformatics/btac159] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 01/15/2022] [Accepted: 03/16/2022] [Indexed: 02/03/2023] Open
Abstract
SUMMARY A new dynamic community identifier (DCI) is presented that relies upon protein residue dynamic cross-correlations generated by Gaussian elastic network models to identify those residue clusters exhibiting motions within a protein. A number of examples of communities are shown for diverse proteins, including GPCRs. It is a tool that can immediately simplify and clarify the most essential functional moving parts of any given protein. Proteins usually can be subdivided into groups of residues that move as communities. These are usually densely packed local sub-structures, but in some cases can be physically distant residues identified to be within the same community. The set of these communities for each protein are the moving parts. The ways in which these are organized overall can aid in understanding many aspects of functional dynamics and allostery. DCI enables a more direct understanding of functions including enzyme activity, action across membranes and changes in the community structure from mutations or ligand binding. The DCI server is freely available on a web site (https://dci.bb.iastate.edu/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ambuj Kumar
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Pranav M Khade
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Karin S Dorman
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Department of Statistics, Iowa State University, Ames, IA 50011, USA
| | - Robert L Jernigan
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
39
|
Blaber M. Variable and Conserved Regions of Secondary Structure in the β-Trefoil Fold: Structure Versus Function. Front Mol Biosci 2022; 9:889943. [PMID: 35517858 PMCID: PMC9062101 DOI: 10.3389/fmolb.2022.889943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 04/01/2022] [Indexed: 11/13/2022] Open
Abstract
β-trefoil proteins exhibit an approximate C3 rotational symmetry. An analysis of the secondary structure for members of this diverse superfamily of proteins indicates that it is comprised of remarkably conserved β-strands and highly-divergent turn regions. A fundamental “minimal” architecture can be identified that is devoid of heterogenous and extended turn regions, and is conserved among all family members. Conversely, the different functional families of β-trefoils can potentially be identified by their unique turn patterns (or turn “signature”). Such analyses provide clues as to the evolution of the β-trefoil family, suggesting a folding/stability role for the β-strands and a functional role for turn regions. This viewpoint can also guide de novo protein design of β-trefoil proteins having novel functionality.
Collapse
Affiliation(s)
- Michael Blaber
- Department of Biomedical Sciences, College of Medicine, Florida State University, Tallahassee, FL, United States
| |
Collapse
|
40
|
Insights into Membrane Curvature Sensing and Membrane Remodeling by Intrinsically Disordered Proteins and Protein Regions. J Membr Biol 2022; 255:237-259. [PMID: 35451616 PMCID: PMC9028910 DOI: 10.1007/s00232-022-00237-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 03/29/2022] [Indexed: 12/15/2022]
Abstract
Cellular membranes are highly dynamic in shape. They can rapidly and precisely regulate their shape to perform various cellular functions. The protein’s ability to sense membrane curvature is essential in various biological events such as cell signaling and membrane trafficking. As they are bound, these curvature-sensing proteins may also change the local membrane shape by one or more curvature driving mechanisms. Established curvature-sensing/driving mechanisms rely on proteins with specific structural features such as amphipathic helices and intrinsically curved shapes. However, the recent discovery and characterization of many proteins have shattered the protein structure–function paradigm, believing that the protein functions require a unique structural feature. Typically, such structure-independent functions are carried either entirely by intrinsically disordered proteins or hybrid proteins containing disordered regions and structured domains. It is becoming more apparent that disordered proteins and regions can be potent sensors/inducers of membrane curvatures. In this article, we outline the basic features of disordered proteins and regions, the motifs in such proteins that encode the function, membrane remodeling by disordered proteins and regions, and assays that may be employed to investigate curvature sensing and generation by ordered/disordered proteins.
Collapse
|
41
|
Cohen N, Kahana A, Schuldiner M. A Similarity-Based Method for Predicting Enzymatic Functions in Yeast Uncovers a New AMP Hydrolase. J Mol Biol 2022; 434:167478. [PMID: 35123996 PMCID: PMC9005783 DOI: 10.1016/j.jmb.2022.167478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 01/22/2022] [Accepted: 01/25/2022] [Indexed: 11/01/2022]
Abstract
Despite decades of research and the availability of the full genomic sequence of the baker's yeast Saccharomyces cerevisiae, still a large fraction of its genome is not functionally annotated. This hinders our ability to fully understand cellular activity and suggests that many additional processes await discovery. The recent years have shown an explosion of high-quality genomic and structural data from multiple organisms, ranging from bacteria to mammals. New computational methods now allow us to integrate these data and extract meaningful insights into the functional identity of uncharacterized proteins in yeast. Here, we created a database of sensitive sequence similarity predictions for all yeast proteins. We use this information to identify candidate enzymes for known biochemical reactions whose enzymes are unidentified, and show how this provides a powerful basis for experimental validation. Using one pathway as a test case we pair a new function for the previously uncharacterized enzyme Yhr202w, as an extra-cellular AMP hydrolase in the NAD degradation pathway. Yhr202w, which we now term Smn1 for Scavenger MonoNucleotidase 1, is a highly conserved protein that is similar to the human protein E5NT/CD73, which is associated with multiple cancers. Hence, our new methodology provides a paradigm, that can be adopted to other organisms, for uncovering new enzymatic functions of uncharacterized proteins.
Collapse
Affiliation(s)
- Nir Cohen
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Amit Kahana
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel. https://twitter.com/AmitKahana
| | - Maya Schuldiner
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel.
| |
Collapse
|
42
|
Tenorio CA, Parker JB, Blaber M. Functionalization of a symmetric protein scaffold: Redundant folding nuclei and alternative oligomeric folding pathways. Protein Sci 2022; 31:e4301. [PMID: 35481645 PMCID: PMC8996475 DOI: 10.1002/pro.4301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 03/12/2022] [Accepted: 03/15/2022] [Indexed: 02/02/2023]
Abstract
Successful de novo protein design ideally targets specific folding kinetics, stability thermodynamics, and biochemical functionality, and the simultaneous achievement of all these criteria in a single step design is challenging. Protein design is potentially simplified by separating the problem into two steps: (a) an initial design of a protein "scaffold" having appropriate folding kinetics and stability thermodynamics, followed by (b) appropriate functional mutation-possibly involving insertion of a peptide functional "cassette." This stepwise approach can also separate the orthogonal effects of the "stability/function" and "foldability/function" tradeoffs commonly observed in protein design. If the scaffold is a protein architecture having an exact rotational symmetry, then there is the potential for redundant folding nuclei and multiple equivalent sites of functionalization; thereby enabling broader functional adaptation. We describe such a "scaffold" and functional "cassette" design strategy applied to a β-trefoil threefold symmetric architecture and a heparin ligand functionality. The results support the availability of redundant folding nuclei within this symmetric architecture, and also identify a minimal peptide cassette conferring heparin affinity. The results also identify an energy barrier of destabilization that switches the protein folding pathway from monomeric to trimeric, thereby identifying another potential advantage of symmetric protein architecture in de novo design.
Collapse
Affiliation(s)
- Connie A. Tenorio
- Department of Biomedical Sciences Florida State University Tallahassee Florida USA
| | - Joseph B. Parker
- Department of Biomedical Sciences Florida State University Tallahassee Florida USA
| | - Michael Blaber
- Department of Biomedical Sciences Florida State University Tallahassee Florida USA
| |
Collapse
|
43
|
Zimmermann MT. Molecular Modeling is an Enabling Approach to Complement and Enhance Channelopathy Research. Compr Physiol 2022; 12:3141-3166. [PMID: 35578963 DOI: 10.1002/cphy.c190047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Hundreds of human membrane proteins form channels that transport necessary ions and compounds, including drugs and metabolites, yet details of their normal function or how function is altered by genetic variants to cause diseases are often unknown. Without this knowledge, researchers are less equipped to develop approaches to diagnose and treat channelopathies. High-resolution computational approaches such as molecular modeling enable researchers to investigate channelopathy protein function, facilitate detailed hypothesis generation, and produce data that is difficult to gather experimentally. Molecular modeling can be tailored to each physiologic context that a protein may act within, some of which may currently be difficult or impossible to assay experimentally. Because many genomic variants are observed in channelopathy proteins from high-throughput sequencing studies, methods with mechanistic value are needed to interpret their effects. The eminent field of structural bioinformatics integrates techniques from multiple disciplines including molecular modeling, computational chemistry, biophysics, and biochemistry, to develop mechanistic hypotheses and enhance the information available for understanding function. Molecular modeling and simulation access 3D and time-dependent information, not currently predictable from sequence. Thus, molecular modeling is valuable for increasing the resolution with which the natural function of protein channels can be investigated, and for interpreting how genomic variants alter them to produce physiologic changes that manifest as channelopathies. © 2022 American Physiological Society. Compr Physiol 12:3141-3166, 2022.
Collapse
Affiliation(s)
- Michael T Zimmermann
- Bioinformatics Research and Development Laboratory, Genomic Sciences and Precision Medicine Center, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Clinical and Translational Sciences Institute, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| |
Collapse
|
44
|
Brandes N, Ofer D, Peleg Y, Rappoport N, Linial M. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 2022; 38:2102-2110. [PMID: 35020807 PMCID: PMC9386727 DOI: 10.1093/bioinformatics/btac020] [Citation(s) in RCA: 120] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 12/27/2021] [Accepted: 01/07/2022] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data. AVAILABILITY AND IMPLEMENTATION Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Yam Peleg
- Deep Trading Ltd., Haifa 3508401, Israel
| | - Nadav Rappoport
- Department of Software and Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev, Beer Sheva 8410501, Israel
| | - Michal Linial
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| |
Collapse
|
45
|
Li G, Dai QQ, Li GB. MeCOM: A Method for Comparing Three-Dimensional Metalloenzyme Active Sites. J Chem Inf Model 2022; 62:730-739. [PMID: 35044164 DOI: 10.1021/acs.jcim.1c01335] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Since metalloenzymes are a large collection of metal ion(s) dependent enzymes, comparison analyses of metalloenzyme active sites are critical for metalloenzyme de novo design, function investigation, and inhibitor development. Here, we report a method named MeCOM for comparing metalloenzyme active sites. It is characterized by metal ion(s) centric active site recognition and three-dimensional superimposition using α-carbon or pharmacophore features. The test results revealed that for the given metalloenzymes, MeCOM could effectively recognize the active sites, extract active site features, and superimpose the active sites; it also could correctly identify similar active sites, differentiate dissimilar active sites, and evaluate the similarity degree. Moreover, MeCOM showed potential to establish new associations between structurally distinct metalloenzymes by active site comparison. MeCOM is freely available at https://mecom.ddtmlab.org.
Collapse
Affiliation(s)
- Gen Li
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry and Sichuan Province, Department of Medicinal Chemistry, West China School of Pharmacy, Sichuan University, Chengdu 610041, China
| | - Qing-Qing Dai
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry and Sichuan Province, Department of Medicinal Chemistry, West China School of Pharmacy, Sichuan University, Chengdu 610041, China
| | - Guo-Bo Li
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry and Sichuan Province, Department of Medicinal Chemistry, West China School of Pharmacy, Sichuan University, Chengdu 610041, China
| |
Collapse
|
46
|
Waman VP, Orengo C, Kleywegt GJ, Lesk AM. Three-dimensional Structure Databases of Biological Macromolecules. Methods Mol Biol 2022; 2449:43-91. [PMID: 35507259 DOI: 10.1007/978-1-0716-2095-3_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Databases of three-dimensional structures of proteins (and their associated molecules) provide: (a) Curated repositories of coordinates of experimentally determined structures, including extensive metadata; for instance information about provenance, details about data collection and interpretation, and validation of results. (b) Information-retrieval tools to allow searching to identify entries of interest and provide access to them. (c) Links among databases, especially to databases of amino-acid and genetic sequences, and of protein function; and links to software for analysis of amino-acid sequence and protein structure, and for structure prediction. (d) Collections of predicted three-dimensional structures of proteins. These will become more and more important after the breakthrough in structure prediction achieved by AlphaFold2. The single global archive of experimentally determined biomacromolecular structures is the Protein Data Bank (PDB). It is managed by wwPDB, a consortium of five partner institutions: the Protein Data Bank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB), the Protein Data Bank Japan (PDBj), the BioMagResBank (BMRB), and the Electron Microscopy Data Bank (EMDB). In addition to jointly managing the PDB repository, the individual wwPDB partners offer many tools for analysis of protein and nucleic acid structures and their complexes, including providing computer-graphic representations. Their collective and individual websites serve as hubs of the community of structural biologists, offering newsletters, reports from Task Forces, training courses, and "helpdesks," as well as links to external software.Many specialized projects are based on the information contained in the PDB. Especially important are SCOP, CATH, and ECOD, which present classifications of protein domains.
Collapse
Affiliation(s)
- Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Gerard J Kleywegt
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology and Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
47
|
The computational investigation of thermal conductivity of 11S globulin protein for biological applications: Molecular dynamics simulation. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2021.118267] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
48
|
Duart G, Lamb J, Ortiz-Mateu J, Elofsson A, Mingarro I. Intra-helical salt bridge contribution to membrane protein insertion. J Mol Biol 2022; 434:167467. [DOI: 10.1016/j.jmb.2022.167467] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 12/22/2021] [Accepted: 01/20/2022] [Indexed: 01/17/2023]
|
49
|
McBride JM, Tlusty T. Slowest-first protein translation scheme: Structural asymmetry and co-translational folding. Biophys J 2021; 120:5466-5477. [PMID: 34813729 PMCID: PMC8715247 DOI: 10.1016/j.bpj.2021.11.024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 09/30/2021] [Accepted: 11/17/2021] [Indexed: 11/19/2022] Open
Abstract
Proteins are translated from the N to the C terminus, raising the basic question of how this innate directionality affects their evolution. To explore this question, we analyze 16,200 structures from the Protein Data Bank (PDB). We find remarkable enrichment of α helices at the C terminus and β strands at the N terminus. Furthermore, this α-β asymmetry correlates with sequence length and contact order, both determinants of folding rate, hinting at possible links to co-translational folding (CTF). Hence, we propose the "slowest-first" scheme, whereby protein sequences evolved structural asymmetry to accelerate CTF: the slowest of the cooperatively folding segments are positioned near the N terminus so they have more time to fold during translation. A phenomenological model predicts that CTF can be accelerated by asymmetry in folding rate, up to double the rate, when folding time is commensurate with translation time; analysis of the PDB predicts that structural asymmetry is indeed maximal in this regime. This correspondence is greater in prokaryotes, which generally require faster protein production. Altogether, this indicates that accelerating CTF is a substantial evolutionary force whose interplay with stability and functionality is encoded in secondary structure asymmetry.
Collapse
Affiliation(s)
- John M McBride
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan, South Korea.
| | - Tsvi Tlusty
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan, South Korea; Departments of Physics and Chemistry, Ulsan National Institute of Science and Technology, Ulsan, South Korea.
| |
Collapse
|
50
|
Caetano-Anollés G, Aziz MF, Mughal F, Caetano-Anollés D. Tracing protein and proteome history with chronologies and networks: folding recapitulates evolution. Expert Rev Proteomics 2021; 18:863-880. [PMID: 34628994 DOI: 10.1080/14789450.2021.1992277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
INTRODUCTION While the origin and evolution of proteins remain mysterious, advances in evolutionary genomics and systems biology are facilitating the historical exploration of the structure, function and organization of proteins and proteomes. Molecular chronologies are series of time events describing the history of biological systems and subsystems and the rise of biological innovations. Together with time-varying networks, these chronologies provide a window into the past. AREAS COVERED Here, we review molecular chronologies and networks built with modern methods of phylogeny reconstruction. We discuss how chronologies of structural domain families uncover the explosive emergence of metabolism, the late rise of translation, the co-evolution of ribosomal proteins and rRNA, and the late development of the ribosomal exit tunnel; events that coincided with a tendency to shorten folding time. Evolving networks described the early emergence of domains and a late 'big bang' of domain combinations. EXPERT OPINION Two processes, folding and recruitment appear central to the evolutionary progression. The former increases protein persistence. The later fosters diversity. Chronologically, protein evolution mirrors folding by combining supersecondary structures into domains, developing translation machinery to facilitate folding speed and stability, and enhancing structural complexity by establishing long-distance interactions in novel structural and architectural designs.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA.,C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, Illinois, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Derek Caetano-Anollés
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|