1
|
Hsu STD. Folding and functions of knotted proteins. Curr Opin Struct Biol 2023; 83:102709. [PMID: 37778185 DOI: 10.1016/j.sbi.2023.102709] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 09/02/2023] [Accepted: 09/05/2023] [Indexed: 10/03/2023]
Abstract
Topologically knotted proteins have entangled structural elements within their native structures that cannot be disentangled simply by pulling from the N- and C-termini. Systematic surveys have identified different types of knotted protein structures, constituting as much as 1% of the total entries within the Protein Data Bank. Many knotted proteins rely on their knotted structural elements to carry out evolutionarily conserved biological functions. Being knotted may also provide mechanical stability to withstand unfolding-coupled proteolysis. Reconfiguring a knotted protein topology by circular permutation or cyclization provides insights into the importance of being knotted in the context of folding and functions. With the explosion of predicted protein structures by artificial intelligence, we are now entering a new era of exploring the entangled protein universe.
Collapse
Affiliation(s)
- Shang-Te Danny Hsu
- Institute of Biological Chemistry, Academia Sinica, Taipei 11529, Taiwan; Institute of Biochemical Sciences, National Taiwan University, Taipei 10617, Taiwan; International Institute for Sustainability with Knotted Chiral Meta Matter (WPI-SKCM(2)), Hiroshima University, Higashi-Hiroshima, Hiroshima 739-8526, Japan.
| |
Collapse
|
2
|
Larrea-Sebal A, Jebari-Benslaiman S, Galicia-Garcia U, Jose-Urteaga AS, Uribe KB, Benito-Vicente A, Martín C. Predictive Modeling and Structure Analysis of Genetic Variants in Familial Hypercholesterolemia: Implications for Diagnosis and Protein Interaction Studies. Curr Atheroscler Rep 2023; 25:839-859. [PMID: 37847331 PMCID: PMC10618353 DOI: 10.1007/s11883-023-01154-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2023] [Indexed: 10/18/2023]
Abstract
PURPOSE OF REVIEW Familial hypercholesterolemia (FH) is a hereditary condition characterized by elevated levels of low-density lipoprotein cholesterol (LDL-C), which increases the risk of cardiovascular disease if left untreated. This review aims to discuss the role of bioinformatics tools in evaluating the pathogenicity of missense variants associated with FH. Specifically, it highlights the use of predictive models based on protein sequence, structure, evolutionary conservation, and other relevant features in identifying genetic variants within LDLR, APOB, and PCSK9 genes that contribute to FH. RECENT FINDINGS In recent years, various bioinformatics tools have emerged as valuable resources for analyzing missense variants in FH-related genes. Tools such as REVEL, Varity, and CADD use diverse computational approaches to predict the impact of genetic variants on protein function. These tools consider factors such as sequence conservation, structural alterations, and receptor binding to aid in interpreting the pathogenicity of identified missense variants. While these predictive models offer valuable insights, the accuracy of predictions can vary, especially for proteins with unique characteristics that might not be well represented in the databases used for training. This review emphasizes the significance of utilizing bioinformatics tools for assessing the pathogenicity of FH-associated missense variants. Despite their contributions, a definitive diagnosis of a genetic variant necessitates functional validation through in vitro characterization or cascade screening. This step ensures the precise identification of FH-related variants, leading to more accurate diagnoses. Integrating genetic data with reliable bioinformatics predictions and functional validation can enhance our understanding of the genetic basis of FH, enabling improved diagnosis, risk stratification, and personalized treatment for affected individuals. The comprehensive approach outlined in this review promises to advance the management of this inherited disorder, potentially leading to better health outcomes for those affected by FH.
Collapse
Affiliation(s)
- Asier Larrea-Sebal
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
- Fundación Biofisika Bizkaia, 48940, Leioa, Spain
| | - Shifa Jebari-Benslaiman
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Unai Galicia-Garcia
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Ane San Jose-Urteaga
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Kepa B Uribe
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Asier Benito-Vicente
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - César Martín
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain.
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain.
| |
Collapse
|
3
|
Pathira Kankanamge LS, Ruffner LA, Touch MM, Pina M, Beuning PJ, Ondrechen MJ. Functional annotation of haloacid dehalogenase superfamily structural genomics proteins. Biochem J 2023; 480:1553-1569. [PMID: 37747786 DOI: 10.1042/bcj20230057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 09/20/2023] [Accepted: 09/25/2023] [Indexed: 09/26/2023]
Abstract
Haloacid dehalogenases (HAD) are members of a large superfamily that includes many Structural Genomics proteins with poorly characterized functionality. This superfamily consists of multiple types of enzymes that can act as sugar phosphatases, haloacid dehalogenases, phosphonoacetaldehyde hydrolases, ATPases, or phosphate monoesterases. Here, we report on predicted functional annotations and experimental testing by direct biochemical assay for Structural Genomics proteins from the HAD superfamily. To characterize the functions of HAD superfamily members, nine representative HAD proteins and 21 structural genomics proteins are analyzed. Using techniques based on computed chemical and electrostatic properties of individual amino acids, the functions of five structural genomics proteins from the HAD superfamily are predicted and validated by biochemical assays. A dehalogenase-like hydrolase, RSc1362 (Uniprot Q8XZN3, PDB 3UMB) is predicted to be a dehalogenase and dehalogenase activity is confirmed experimentally. Four proteins predicted to be sugar phosphatases are characterized as follows: a sugar phosphatase from Thermophilus volcanium (Uniprot Q978Y6) with trehalose-6-phosphate phosphatase and fructose-6-phosphate phosphatase activity; haloacid dehalogenase-like hydrolase from Bacteroides thetaiotaomicron (Uniprot Q8A2F3; PDB 3NIW) with fructose-6-phosphate phosphatase and sucrose-6-phosphate phosphatase activity; putative phosphatase from Eubacterium rectale (Uniprot D0VWU2; PDB 3DAO) as a sucrose-6-phosphate phosphatase; and hypothetical protein from Geobacillus kaustophilus (Uniprot Q5L139; PDB 2PQ0) as a fructose-6-phosphate phosphatase. Most of these sugar phosphatases showed some substrate promiscuity.
Collapse
Affiliation(s)
| | - Lydia A Ruffner
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, U.S.A
| | - Mong Mary Touch
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, U.S.A
| | - Manuel Pina
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, U.S.A
| | - Penny J Beuning
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, U.S.A
| | - Mary Jo Ondrechen
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, U.S.A
| |
Collapse
|
4
|
AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun Biol 2023; 6:160. [PMID: 36755055 PMCID: PMC9908985 DOI: 10.1038/s42003-023-04488-9] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open
Abstract
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
Collapse
|
5
|
Binder JL, Berendzen J, Stevens AO, He Y, Wang J, Dokholyan NV, Oprea TI. AlphaFold illuminates half of the dark human proteins. Curr Opin Struct Biol 2022; 74:102372. [PMID: 35439658 PMCID: PMC10669925 DOI: 10.1016/j.sbi.2022.102372] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 03/02/2022] [Accepted: 03/13/2022] [Indexed: 01/05/2023]
Abstract
We investigate the use of confidence scores to evaluate the accuracy of a given AlphaFold (AF2) protein model for drug discovery. Prediction of accuracy is improved by not considering confidence scores below 80 due to the effects of disorder. On a set of recent crystal structures, 95% are likely to have accurate folds. Conformational discordance in the training set has a much more significant effect on accuracy than sequence divergence. We propose criteria for models and residues that are possibly useful for virtual screening. Based on these criteria, AF2 provides models for half of understudied (dark) human proteins and two-thirds of residues in those models.
Collapse
Affiliation(s)
- Jessica L Binder
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA. https://twitter.com/@jessicamaine
| | - Joel Berendzen
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA
| | - Amy O Stevens
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM 87131, USA
| | - Yi He
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA; Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM 87131, USA
| | - Jian Wang
- Department of Pharmacology, Department of Biochemistry and Molecular Biology, Penn State University College of Medicine, Hershey, PA 17033, USA
| | - Nikolay V Dokholyan
- Department of Pharmacology, Department of Biochemistry and Molecular Biology, Penn State University College of Medicine, Hershey, PA 17033, USA; Department of Chemistry and Department of Biomedical Engineering, Pennsylvania State University, University Park, PA 16802, United States
| | - Tudor I Oprea
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA; UNM Comprehensive Cancer Center, Albuquerque, NM, USA; Department of Rheumatology and Inflammation Research, Institute of Medicine, Sahlgrenska Academy at the University of Gothenburg, Gothenburg, Sweden; Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
6
|
Miñarro-Lleonar M, Ruiz-Carmona S, Alvarez-Garcia D, Schmidtke P, Barril X. Development of an Automatic Pipeline for Participation in the CELPP Challenge. Int J Mol Sci 2022; 23:ijms23094756. [PMID: 35563148 PMCID: PMC9105952 DOI: 10.3390/ijms23094756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 04/20/2022] [Accepted: 04/21/2022] [Indexed: 12/01/2022] Open
Abstract
The prediction of how a ligand binds to its target is an essential step for Structure-Based Drug Design (SBDD) methods. Molecular docking is a standard tool to predict the binding mode of a ligand to its macromolecular receptor and to quantify their mutual complementarity, with multiple applications in drug design. However, docking programs do not always find correct solutions, either because they are not sampled or due to inaccuracies in the scoring functions. Quantifying the docking performance in real scenarios is essential to understanding their limitations, managing expectations and guiding future developments. Here, we present a fully automated pipeline for pose prediction validated by participating in the Continuous Evaluation of Ligand Pose Prediction (CELPP) Challenge. Acknowledging the intrinsic limitations of the docking method, we devised a strategy to automatically mine and exploit pre-existing data, defining—whenever possible—empirical restraints to guide the docking process. We prove that the pipeline is able to generate predictions for most of the proposed targets as well as obtain poses with low RMSD values when compared to the crystal structure. All things considered, our pipeline highlights some major challenges in the automatic prediction of protein–ligand complexes, which will be addressed in future versions of the pipeline.
Collapse
Affiliation(s)
- Marina Miñarro-Lleonar
- Pharmacy Faculty, University of Barcelona, Av. de Joan XXIII 27-31, 08028 Barcelona, Spain;
| | | | - Daniel Alvarez-Garcia
- GAIN Therapeutics, Parc Cientific de Barcelona, Baldiri i Reixac 10, 08029 Barcelona, Spain;
| | - Peter Schmidtke
- Discngine S.A.S., 79 Avenue Ledru Rollin, 75012 Paris, France;
| | - Xavier Barril
- Pharmacy Faculty, University of Barcelona, Av. de Joan XXIII 27-31, 08028 Barcelona, Spain;
- GAIN Therapeutics, Parc Cientific de Barcelona, Baldiri i Reixac 10, 08029 Barcelona, Spain;
- Catalan Institute for Research and Advanced Studies (ICREA), Passeig de Lluis Companys 23, 08010 Barcelona, Spain
- Correspondence:
| |
Collapse
|
7
|
Adasme MF, Bolz SN, Al-Fatlawi A, Schroeder M. Decomposing compounds enables reconstruction of interaction fingerprints for structure-based drug screening. J Cheminform 2022; 14:17. [PMID: 35292113 PMCID: PMC8922937 DOI: 10.1186/s13321-022-00592-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 02/25/2022] [Indexed: 11/13/2022] Open
Abstract
Background Structure-based drug repositioning has emerged as a promising alternative to conventional drug development. Regardless of the many success stories reported over the past years and the novel breakthroughs on the AI-based system AlphaFold for structure prediction, the availability of structural data for protein–drug complexes remains very limited. Whereas the chemical libraries contain millions of drug compounds, the vast majority of them do not have structures to crystallized targets,and it is, therefore, impossible to characterize their binding to targets from a structural view. However, the concept of building blocks offers a novel perspective on the structural problem. A drug compound is considered a complex of small chemical blocks or fragments, which confer the relevant properties to the drug and have a high proportion of functional groups involved in protein binding. Based on this, we propose a novel approach to expand the scope of structure-based repositioning approaches by transferring the structural knowledge from a fragment to a compound level. Results We fragmented over 100,000 compounds in the Protein Data Bank (PDB) and characterized the structural binding mode of 153,000 fragments to their crystallized targets. Using the fragment’s data, we were able to artificially reconstruct the binding mode of over 7,800 complexes between ChEMBL compounds and their known targets, for which no structural data is available. We proved that the conserved binding tendency of fragments, when binding to the same targets, highly influences the drug’s binding specificity and carries the key information to reconstruct full drugs binding mode. Furthermore, our approach was able to reconstruct multiple compound-target pairs at optimal thresholds and high similarity to the actual binding mode. Conclusions Such reconstructions are of great value and benefit structure-based drug repositioning since they automatically enlarge the technique’s scope and allow exploring the so far ‘unexplored compounds’ from a structural perspective. In general, the transfer of structural information is a promising technique that could be applied to any chemical library, to any compound that has no crystal structure available in PDB, and even to transfer any other feature that may be relevant for the drug discovery process and that due to data limitations is not yet fully available. In that sense, the results of this work document the full potential of structure-based screening even beyond PDB. Graphical Abstract ![]()
Supplementary Information The online version contains supplementary material available at 10.1186/s13321-022-00592-w.
Collapse
Affiliation(s)
- Melissa F Adasme
- Biotechnology Center (BIOTEC), CMCB, Technische Universitat Dresden, Tatzberg 47-49, 01307, Dresden, Germany
| | - Sarah Naomi Bolz
- Biotechnology Center (BIOTEC), CMCB, Technische Universitat Dresden, Tatzberg 47-49, 01307, Dresden, Germany
| | - Ali Al-Fatlawi
- Biotechnology Center (BIOTEC), CMCB, Technische Universitat Dresden, Tatzberg 47-49, 01307, Dresden, Germany
| | - Michael Schroeder
- Biotechnology Center (BIOTEC), CMCB, Technische Universitat Dresden, Tatzberg 47-49, 01307, Dresden, Germany.
| |
Collapse
|
8
|
Mesdaghi S, Murphy DL, Sánchez Rodríguez F, Burgos-Mármol JJ, Rigden DJ. In silico prediction of structure and function for a large family of transmembrane proteins that includes human Tmem41b. F1000Res 2021; 9:1395. [PMID: 33520197 PMCID: PMC7818093 DOI: 10.12688/f1000research.27676.2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/11/2021] [Indexed: 01/07/2023] Open
Abstract
Background: Recent strides in computational structural biology have opened up an opportunity to understand previously uncharacterised proteins. The under-representation of transmembrane proteins in the Protein Data Bank highlights the need to apply new and advanced bioinformatics methods to shed light on their structure and function. This study focuses on a family of transmembrane proteins containing the Pfam domain PF09335 ('SNARE_ASSOC'/ 'VTT '/'Tvp38'/'DedA'). One prominent member, Tmem41b, has been shown to be involved in early stages of autophagosome formation and is vital in mouse embryonic development as well as being identified as a viral host factor of SARS-CoV-2. Methods: We used evolutionary covariance-derived information to construct and validate ab initio models, make domain boundary predictions and infer local structural features. Results: The results from the structural bioinformatics analysis of Tmem41b and its homologues showed that they contain a tandem repeat that is clearly visible in evolutionary covariance data but much less so by sequence analysis. Furthermore, cross-referencing of other prediction data with covariance analysis showed that the internal repeat features two-fold rotational symmetry. Ab initio modelling of Tmem41b and homologues reinforces these structural predictions. Local structural features predicted to be present in Tmem41b were also present in Cl -/H + antiporters. Conclusions: The results of this study strongly point to Tmem41b and its homologues being transporters for an as-yet uncharacterised substrate and possibly using H + antiporter activity as its mechanism for transport.
Collapse
Affiliation(s)
- Shahram Mesdaghi
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - David L. Murphy
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Filomeno Sánchez Rodríguez
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - J. Javier Burgos-Mármol
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Daniel J. Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK,
| |
Collapse
|
9
|
Mesdaghi S, Murphy DL, Sánchez Rodríguez F, Burgos-Mármol JJ, Rigden DJ. In silico prediction of structure and function for a large family of transmembrane proteins that includes human Tmem41b. F1000Res 2021; 9:1395. [PMID: 33520197 PMCID: PMC7818093 DOI: 10.12688/f1000research.27676.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/23/2020] [Indexed: 01/07/2023] Open
Abstract
Background: Recent strides in computational structural biology have opened up an opportunity to understand previously uncharacterised proteins. The under-representation of transmembrane proteins in the Protein Data Bank highlights the need to apply new and advanced bioinformatics methods to shed light on their structure and function. This study focuses on a family of transmembrane proteins containing the Pfam domain PF09335 ('SNARE_ASSOC'/ 'VTT '/'Tvp38'). One prominent member, Tmem41b, has been shown to be involved in early stages of autophagosome formation and is vital in mouse embryonic development as well as being identified as a viral host factor of SARS-CoV-2. Methods: We used evolutionary covariance-derived information to construct and validate ab initio models, make domain boundary predictions and infer local structural features. Results: The results from the structural bioinformatics analysis of Tmem41b and its homologues showed that they contain a tandem repeat that is clearly visible in evolutionary covariance data but much less so by sequence analysis. Furthermore, cross-referencing of other prediction data with covariance analysis showed that the internal repeat features two-fold rotational symmetry. Ab initio modelling of Tmem41b and homologues reinforces these structural predictions. Local structural features predicted to be present in Tmem41b were also present in Cl -/H + antiporters. Conclusions: The results of this study strongly point to Tmem41b and its homologues being transporters for an as-yet uncharacterised substrate and possibly using H + antiporter activity as its mechanism for transport.
Collapse
Affiliation(s)
- Shahram Mesdaghi
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - David L. Murphy
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Filomeno Sánchez Rodríguez
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - J. Javier Burgos-Mármol
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Daniel J. Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK,
| |
Collapse
|
10
|
Zhang J, Ghadermarzi S, Kurgan L. Prediction of protein-binding residues: dichotomy of sequence-based methods developed using structured complexes versus disordered proteins. Bioinformatics 2021; 36:4729-4738. [PMID: 32860044 DOI: 10.1093/bioinformatics/btaa573] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 05/22/2020] [Accepted: 06/10/2020] [Indexed: 01/08/2023] Open
Abstract
MOTIVATION There are over 30 sequence-based predictors of the protein-binding residues (PBRs). They use either structure-annotated or disorder-annotated training datasets, potentially creating a dichotomy where the structure-/disorder-specific models may not be able to cross-over to accurately predict the other type. Moreover, the structure-trained predictors were shown to substantially cross-predict PBRs among residues that interact with non-protein partners (nucleic acids and small ligands). We address these issues by performing first-of-its-kind comparative study of a representative collection of disorder- and structure-trained predictors using a comprehensive benchmark set with the structure- and disorder-derived annotations of PBRs (to analyze the cross-over) and the protein-, nucleic acid- and small ligand-binding proteins (to study the cross-predictions). RESULTS Three predictors provide accurate results: SCRIBER, ANCHOR and disoRDPbind. Some of the structure-trained methods make accurate predictions on the structure-annotated proteins. Similarly, the disorder-trained predictors predict well on the disorder-annotated proteins. However, the considered predictors generally fail to cross-over, with the exception of SCRIBER. Our study also reveals that virtually all methods substantially cross-predict PBRs, except for SCRIBER for the structure-annotated proteins and disoRDPbind for the disorder-annotated proteins. We formulate a novel hybrid predictor, hybridPBRpred, that combines results produced by disoRDPbind and SCRIBER to accurately predict disorder- and structure-annotated PBRs. HybridPBRpred generates accurate results that cross-over structure- and disorder-annotated proteins and produces relatively low amount of cross-predictions, offering an accurate alternative to predict PBRs. AVAILABILITY AND IMPLEMENTATION HybridPBRpred webserver, benchmark dataset and supplementary information are available at http://biomine.cs.vcu.edu/servers/hybridPBRpred/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
11
|
NMR Structure Determinations of Small Proteins Using only One Fractionally 20% 13C- and Uniformly 100% 15N-Labeled Sample. Molecules 2021; 26:molecules26030747. [PMID: 33535444 PMCID: PMC7867066 DOI: 10.3390/molecules26030747] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 01/26/2021] [Accepted: 01/27/2021] [Indexed: 11/17/2022] Open
Abstract
Uniformly 13C- and 15N-labeled samples ensure fast and reliable nuclear magnetic resonance (NMR) assignments of proteins and are commonly used for structure elucidation by NMR. However, the preparation of uniformly labeled samples is a labor-intensive and expensive step. Reducing the portion of 13C-labeled glucose by a factor of five using a fractional 20% 13C- and 100% 15N-labeling scheme could lower the total chemical costs, yet retaining sufficient structural information of uniformly [13C, 15N]-labeled sample as a result of the improved sensitivity of NMR instruments. Moreover, fractional 13C-labeling can facilitate reliable resonance assignments of sidechains because of the biosynthetic pathways of each amino-acid. Preparation of only one [20% 13C, 100% 15N]-labeled sample for small proteins (<15 kDa) could also eliminate redundant sample preparations of 100% 15N-labeled and uniformly 100% [13C, 15N]-labeled samples of proteins. We determined the NMR structures of a small alpha-helical protein, the C domain of IgG-binding protein A from Staphylococcus aureus (SpaC), and a small beta-sheet protein, CBM64 module using [20% 13C, 100% 15N]-labeled sample and compared with the crystal structures and the NMR structures derived from the 100% [13C, 15N]-labeled sample. Our results suggest that one [20% 13C, 100% 15N]-labeled sample of small proteins could be routinely used as an alternative to conventional 100% [13C, 15N]-labeling for backbone resonance assignments, NMR structure determination, 15N-relaxation analysis, and ligand–protein interaction.
Collapse
|
12
|
Wilson IA, Stanfield RL. 50 Years of structural immunology. J Biol Chem 2021; 296:100745. [PMID: 33957119 PMCID: PMC8163984 DOI: 10.1016/j.jbc.2021.100745] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 03/24/2021] [Accepted: 04/30/2021] [Indexed: 12/12/2022] Open
Abstract
Fifty years ago, the first landmark structures of antibodies heralded the dawn of structural immunology. Momentum then started to build toward understanding how antibodies could recognize the vast universe of potential antigens and how antibody-combining sites could be tailored to engage antigens with high specificity and affinity through recombination of germline genes (V, D, J) and somatic mutation. Equivalent groundbreaking structures in the cellular immune system appeared some 15 to 20 years later and illustrated how processed protein antigens in the form of peptides are presented by MHC molecules to T cell receptors. Structures of antigen receptors in the innate immune system then explained their inherent specificity for particular microbial antigens including lipids, carbohydrates, nucleic acids, small molecules, and specific proteins. These two sides of the immune system act immediately (innate) to particular microbial antigens or evolve (adaptive) to attain high specificity and affinity to a much wider range of antigens. We also include examples of other key receptors in the immune system (cytokine receptors) that regulate immunity and inflammation. Furthermore, these antigen receptors use a limited set of protein folds to accomplish their various immunological roles. The other main players are the antigens themselves. We focus on surface glycoproteins in enveloped viruses including SARS-CoV-2 that enable entry and egress into host cells and are targets for the antibody response. This review covers what we have learned over the past half century about the structural basis of the immune response to microbial pathogens and how that information can be utilized to design vaccines and therapeutics.
Collapse
MESH Headings
- Adaptive Immunity
- Allergy and Immunology/history
- Animals
- Antibodies, Viral/chemistry
- Antibodies, Viral/genetics
- Antibodies, Viral/immunology
- Antibody Specificity
- Antigen Presentation
- Antigens, Viral/chemistry
- Antigens, Viral/genetics
- Antigens, Viral/immunology
- COVID-19/immunology
- COVID-19/virology
- Crystallography/history
- Crystallography/methods
- History, 20th Century
- History, 21st Century
- Humans
- Immunity, Innate
- Protein Folding
- Protein Interaction Domains and Motifs
- Receptors, Antigen, T-Cell/chemistry
- Receptors, Antigen, T-Cell/genetics
- Receptors, Antigen, T-Cell/immunology
- Receptors, Cytokine/chemistry
- Receptors, Cytokine/genetics
- Receptors, Cytokine/immunology
- SARS-CoV-2/immunology
- SARS-CoV-2/pathogenicity
- V(D)J Recombination
Collapse
Affiliation(s)
- Ian A Wilson
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, USA; The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, California, USA.
| | - Robyn L Stanfield
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, USA
| |
Collapse
|
13
|
Golea-Secara A, Munteanu C, Sarbu M, Cretu OM, Velciov S, Vlad A, Bob F, Gadalean F, Gluhovschi C, Milas O, Simulescu A, Mogos-Stefan M, Patruica M, Petrica L, Zamfir AD. Urinary proteins detected using modern proteomics intervene in early type 2 diabetic kidney disease – a pilot study. Biomark Med 2020; 14:1521-1536. [DOI: 10.2217/bmm-2020-0308] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Aim: An advanced proteomics platform for protein biomarker discovery in diabetic chronic kidney disease (DKD) was developed, validated and implemented. Materials & methods: Three Type 2 diabetes mellitus patients and three control subjects were enrolled. Urinary peptides were extracted, samples were analyzed on a hybrid LTQ-Orbitrap Velos Pro instrument. Raw data were searched using the SEQUEST algorithm and integrated into Proteome Discoverer platform. Results & discussion: Unique peptide sequences, resulted sequence coverage, scoring of peptide spectrum matches were reported to albuminuria and databases. Five proteins that can be associated with early DKD were found: apolipoprotein AI, neutrophil gelatinase-associated lipocalin, cytidine deaminase, S100-A8 and hemoglobin subunit delta. Conclusion: Urinary proteome analysis could be used to evaluate mechanisms of pathogenesis of DKD.
Collapse
Affiliation(s)
- Alina Golea-Secara
- Department of Nephrology, County Emergency Hospital Timisoara, Timisoara, Romania
- ‘Victor Babes’ University of Medicine & Pharmacy, Timisoara, Romania
| | - Cristian Munteanu
- Department of Bioinformatics & Structural Biochemistry, Institute of Biochemistry, Bucharest, Romania
| | - Mirela Sarbu
- National Institute for Research & Development in Electrochemistry & Condensed Matter, Timisoara, Romania
| | - Octavian M Cretu
- ‘Victor Babes’ University of Medicine & Pharmacy, Timisoara, Romania
- Department of Surgery I, Municipal Emergency Hospital Timisoara, Timisoara, Romania
| | - Silvia Velciov
- Department of Nephrology, County Emergency Hospital Timisoara, Timisoara, Romania
- ‘Victor Babes’ University of Medicine & Pharmacy, Timisoara, Romania
| | - Adrian Vlad
- ‘Victor Babes’ University of Medicine & Pharmacy, Timisoara, Romania
- Department of Diabetes & Metabolic Diseases, County Emergency Hospital, Timisoara, Romania
| | - Flaviu Bob
- Department of Nephrology, County Emergency Hospital Timisoara, Timisoara, Romania
- ‘Victor Babes’ University of Medicine & Pharmacy, Timisoara, Romania
| | - Florica Gadalean
- Department of Nephrology, County Emergency Hospital Timisoara, Timisoara, Romania
- ‘Victor Babes’ University of Medicine & Pharmacy, Timisoara, Romania
| | | | - Oana Milas
- Department of Nephrology, County Emergency Hospital Timisoara, Timisoara, Romania
- ‘Victor Babes’ University of Medicine & Pharmacy, Timisoara, Romania
| | - Anca Simulescu
- Department of Nephrology, County Emergency Hospital Timisoara, Timisoara, Romania
- ‘Victor Babes’ University of Medicine & Pharmacy, Timisoara, Romania
| | - Maria Mogos-Stefan
- Department of Nephrology, County Emergency Hospital Timisoara, Timisoara, Romania
- ‘Victor Babes’ University of Medicine & Pharmacy, Timisoara, Romania
| | - Mihaela Patruica
- Department of Nephrology, County Emergency Hospital Timisoara, Timisoara, Romania
- ‘Victor Babes’ University of Medicine & Pharmacy, Timisoara, Romania
| | - Ligia Petrica
- Department of Nephrology, County Emergency Hospital Timisoara, Timisoara, Romania
- ‘Victor Babes’ University of Medicine & Pharmacy, Timisoara, Romania
- Centre of Translational Research & Systems Medicine, ‘Victor Babes’ University of Medicine & Pharmacy, Timisoara, Romania
| | - Alina D Zamfir
- National Institute for Research & Development in Electrochemistry & Condensed Matter, Timisoara, Romania
| |
Collapse
|
14
|
Delhommel F, Gabel F, Sattler M. Current approaches for integrating solution NMR spectroscopy and small-angle scattering to study the structure and dynamics of biomolecular complexes. J Mol Biol 2020; 432:2890-2912. [DOI: 10.1016/j.jmb.2020.03.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 02/27/2020] [Accepted: 03/10/2020] [Indexed: 01/24/2023]
|
15
|
Computational methods and tools for binding site recognition between proteins and small molecules: from classical geometrical approaches to modern machine learning strategies. J Comput Aided Mol Des 2019; 33:887-903. [PMID: 31628659 DOI: 10.1007/s10822-019-00235-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 10/11/2019] [Indexed: 10/25/2022]
Abstract
In the current "genomic era" the number of identified genes is growing exponentially. However, the biological function of a large number of the corresponding proteins is still unknown. Recognition of small molecule ligands (e.g., substrates, inhibitors, allosteric regulators, etc.) is pivotal for protein functions in the vast majority of the cases and knowledge of the region where these processes take place is essential for protein function prediction and drug design. In this regard, computational methods represent essential tools to tackle this problem. A significant number of software tools have been developed in the last few years which exploit either protein sequence information, structure information or both. This review describes the most recent developments in protein function recognition and binding site prediction, in terms of both freely-available and commercial solutions and tools, detailing the main characteristics of the considered tools and providing a comparative analysis of their performance.
Collapse
|
16
|
Mayol E, Campillo M, Cordomí A, Olivella M. Inter-residue interactions in alpha-helical transmembrane proteins. Bioinformatics 2019; 35:2578-2584. [PMID: 30566615 DOI: 10.1093/bioinformatics/bty978] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 10/19/2018] [Accepted: 12/17/2018] [Indexed: 01/23/2023] Open
Abstract
MOTIVATION The number of available membrane protein structures has markedly increased in the last years and, in parallel, the reliability of the methods to detect transmembrane (TM) segments. In the present report, we characterized inter-residue interactions in α-helical membrane proteins using a dataset of 3462 TM helices from 430 proteins. This is by far the largest analysis published to date. RESULTS Our analysis of residue-residue interactions in TM segments of membrane proteins shows that almost all interactions involve aliphatic residues and Phe. There is lack of polar-polar, polar-charged and charged-charged interactions except for those between Thr or Ser sidechains and the backbone carbonyl of aliphatic and Phe residues. The results are discussed in the context of the preferences of amino acids to be in the protein core or exposed to the lipid bilayer and to occupy specific positions along the TM segment. Comparison to datasets of β-barrel membrane proteins and of α-helical globular proteins unveils the specific patterns of interactions and residue composition characteristic of α-helical membrane proteins that are the clue to understanding their structure. AVAILABILITY AND IMPLEMENTATION Results data and datasets used are available at http://lmc.uab.cat/TMalphaDB/interactions.php. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Eduardo Mayol
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, Bellaterra, Spain
| | - Mercedes Campillo
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, Bellaterra, Spain
| | - Arnau Cordomí
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, Bellaterra, Spain
| | - Mireia Olivella
- Bioinformatics Area, School of International Studies, ESCI-UPF, Barcelona, Spain.,Bioinformatics and Medical Statistics Group, U Science Tech, Central University of Catalonia, Vic, Barcelona, Spain
| |
Collapse
|
17
|
Negron C, Pearlman DA, del Angel G. Predicting mutations deleterious to function in beta-lactamase TEM1 using MM-GBSA. PLoS One 2019; 14:e0214015. [PMID: 30889230 PMCID: PMC6424398 DOI: 10.1371/journal.pone.0214015] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 03/05/2019] [Indexed: 12/15/2022] Open
Abstract
Missense mutations can have disastrous effects on the function of a protein. And as a result, they have been implicated in numerous diseases. However, the majority of missense variants only have a nominal impact on protein function. Thus, the ability to distinguish these two classes of missense mutations would greatly aid drug discovery efforts in target identification and validation as well as medical diagnosis. Monitoring the co-occurrence of a given missense mutation and a disease phenotype provides a pathway for classifying functionally disrupting missense mutations. But, the occurrence of a specific missense variant is often extremely rare making statistical links challenging to infer. In this study, we benchmark a physics-based approach for predicting changes in stability, MM-GBSA, and apply it to classifying mutations as functionally disrupting. A large and diverse dataset of 990 residue mutations in beta-lactamase TEM1 is used to assess performance as it is rich in both functionally disrupting mutations and functionally neutral/beneficial mutations. On this dataset, we compare the performance of MM-GBSA to alternative strategies for predicting functionally disrupting mutations. We observe that the MM-GBSA method obtains an area under the curve (AUC) of 0.75 on the entire dataset, outperforming all other predictors tested. More importantly, MM-GBSA’s performance is robust to various divisions of the dataset, speaking to the generality of the approach. Though there is one notable exception: Mutations on the surface of the protein are the mutations that are the most difficult to classify as functionally disrupting for all methods tested. This is likely due to the many mechanisms available to surface mutations to disrupt function, and thus provides a direction of focus for future studies.
Collapse
Affiliation(s)
| | | | - Guillermo del Angel
- Alexion Pharmaceuticals Inc., Boston, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
18
|
Braitbard M, Schneidman-Duhovny D, Kalisman N. Integrative Structure Modeling: Overview and Assessment. Annu Rev Biochem 2019; 88:113-135. [PMID: 30830798 DOI: 10.1146/annurev-biochem-013118-111429] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Integrative structure modeling computationally combines data from multiple sources of information with the aim of obtaining structural insights that are not revealed by any single approach alone. In the first part of this review, we survey the commonly used sources of structural information and the computational aspects of model building. Throughout the past decade, integrative modeling was applied to various biological systems, with a focus on large protein complexes. Recent progress in the field of cryo-electron microscopy (cryo-EM) has resolved many of these complexes to near-atomic resolution. In the second part of this review, we compare a range of published integrative models with their higher-resolution counterparts with the aim of critically assessing their accuracy. This comparison gives a favorable view of integrative modeling and demonstrates its ability to yield accurate and informative results. We discuss possible roles of integrative modeling in the new era of cryo-EM and highlight future challenges and directions.
Collapse
Affiliation(s)
- Merav Braitbard
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel;
| | - Dina Schneidman-Duhovny
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel; .,School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel;
| | - Nir Kalisman
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel;
| |
Collapse
|
19
|
Viswanathan R, Fajardo E, Steinberg G, Haller M, Fiser A. Protein-protein binding supersites. PLoS Comput Biol 2019; 15:e1006704. [PMID: 30615604 PMCID: PMC6336348 DOI: 10.1371/journal.pcbi.1006704] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 01/17/2019] [Accepted: 12/05/2018] [Indexed: 11/19/2022] Open
Abstract
The lack of a deep understanding of how proteins interact remains an important roadblock in advancing efforts to identify binding partners and uncover the corresponding regulatory mechanisms of the functions they mediate. Understanding protein-protein interactions is also essential for designing specific chemical modifications to develop new reagents and therapeutics. We explored the hypothesis of whether protein interaction sites serve as generic biding sites for non-cognate protein ligands, just as it has been observed for small-molecule-binding sites in the past. Using extensive computational docking experiments on a test set of 241 protein complexes, we found that indeed there is a strong preference for non-cognate ligands to bind to the cognate binding site of a receptor. This observation appears to be robust to variations in docking programs, types of non-cognate protein probes, sizes of binding patches, relative sizes of binding patches and full-length proteins, and the exploration of obligate and non-obligate complexes. The accuracy of the docking scoring function appears to play a role in defining the correct site. The frequency of interaction of unrelated probes recognizing the binding interface was utilized in a simple prediction algorithm that showed accuracy competitive with other state of the art methods.
Collapse
Affiliation(s)
- Raji Viswanathan
- Department of Chemistry, Yeshiva University, New York, NY, United States of America
| | - Eduardo Fajardo
- Departments of Systems & Computational Biology, and Biochemistry, Albert Einstein College of Medicine, Bronx, NY, United States of America
| | - Gabriel Steinberg
- Department of Chemistry, Yeshiva University, New York, NY, United States of America
| | - Matthew Haller
- Department of Chemistry, Yeshiva University, New York, NY, United States of America
| | - Andras Fiser
- Departments of Systems & Computational Biology, and Biochemistry, Albert Einstein College of Medicine, Bronx, NY, United States of America
- * E-mail:
| |
Collapse
|
20
|
Trevizani R, Custódio FL. Supersecondary Structures and Fragment Libraries. Methods Mol Biol 2019; 1958:283-295. [PMID: 30945224 DOI: 10.1007/978-1-4939-9161-7_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The use of smotifs and fragment libraries has proven useful to both simplify and increase the quality of protein models. Here, we present Profrager, a tool that automatically generates putative structural fragments to reproduce local motifs of proteins given a target sequence. Profrager is highly customizable, allowing the user to select the number of fragments per library, the ranking method is able to generate fragments of all sizes, and it was recently modified to include the possibility of output exclusively smotifs.
Collapse
|
21
|
Hu G, Wang K, Song J, Uversky VN, Kurgan L. Taxonomic Landscape of the Dark Proteomes: Whole-Proteome Scale Interplay Between Structural Darkness, Intrinsic Disorder, and Crystallization Propensity. Proteomics 2018; 18:e1800243. [PMID: 30198635 DOI: 10.1002/pmic.201800243] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 08/30/2018] [Indexed: 12/14/2022]
Abstract
Growth rate of the protein sequence universe dramatically exceeds the speed of expansion for the protein structure universe, generating an immense dark proteome that includes proteins with unknown structure. A whole-proteome scale analysis of 5.4 million proteins from 987 proteomes in the three domains of life and viruses to systematically dissect an interplay between structural coverage, degree of putative intrinsic disorder, and predicted propensity for structure determination is performed. It has been found that Archaean and Bacterial proteomes have relatively high structural coverage and low amounts of disorder, whereas Eukaryotic and Viral proteomes are characterized by a broad spread of structural coverage and higher disorder levels. The analysis reveals that dark proteomes (i.e., proteomes containing high fractions of proteins with unknown structure) have significantly elevated amounts of intrinsic disorder and are predicted to be difficult to solve structurally. Although the majority of dark proteomes are of viral origin, many dark viral proteomes have at least modest crystallization propensity and only a handful of them are enriched in the intrinsic disorder. The disorder, structural coverage, and propensity are mapped for structural determination onto a novel proteome-level sequence similarity network to analyze the interplay of these characteristics in the taxonomic landscape.
Collapse
Affiliation(s)
- Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, 33612, USA.,Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, 142290, Russia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
22
|
Iyer MS, Joshi AG, Sowdhamini R. Genome-wide survey of remote homologues for protein domain superfamilies of known structure reveals unequal distribution across structural classes. Mol Omics 2018; 14:266-280. [PMID: 29971307 DOI: 10.1039/c8mo00008e] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Domains are the basic building blocks of proteins which can combine to give rise to different domain architectures. Annotation of domains in a sequence is the first step towards understanding the biological function. Since there are a limited number of folds and evolutionarily related proteins have a similar structure, function can be inferred through remote homology. Computational sequence searches were performed for remote homologues on genomes of around ∼160 000 different organisms, starting from nearly 11 000 superfamily queries of known structure. Case studies revealed that most of the associated domains are involved in the same biological process. Using all the proteins predicted to have at least one structural domain, a coverage of 61% of Pfam families was achieved which is higher than the existing methods (43.36% by SIFTS). Taxonomic analysis of the proteins revealed 493 superfamilies in all the major kingdoms of life and a few lateral gene transfers between viruses and cellular organisms. The distribution of remote homologues across different classes, folds and superfamilies was studied and reveals that sequences are unequally distributed across structural classes. Finally, domain architectures were computed for the homologues and these data were compiled for each superfamily and organism.
Collapse
Affiliation(s)
- Meenakshi S Iyer
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bellary Road, Bangalore, Karnataka 560 065, India.
| | | | | |
Collapse
|
23
|
Abstract
The vast, mostly unknown protein universe can be explored by analyzing protein sequences as a string of domains. A broader coverage can be achieved when these domains, the essential blocks in protein evolution, are detected using sequence profiles. Using clustering to collapse redundant profiles into unique function words (UFWs), we find that over the years 2009–2016, the number of UFWs saturates while the number of sequences matched by a combination of two or more UFWs grows exponentially. Between 2009 and 2016 the number of protein sequences from known species increased 10-fold from 8 million to 85 million. About 80% of these sequences contain at least one region recognized by the conserved domain architecture retrieval tool (CDART) as a sequence motif. Motifs provide clues to biological function but CDART often matches the same region of a protein by two or more profiles. Such synonyms complicate estimates of functional complexity. We do full-linkage clustering of redundant profiles by finding maximum disjoint cliques: Each cluster is replaced by a single representative profile to give what we term a unique function word (UFW). From 2009 to 2016, the number of sequence profiles used by CDART increased by 80%; the number of UFWs increased more slowly by 30%, indicating that the number of UFWs may be saturating. The number of sequences matched by a single UFW (sequences with single domain architectures) increased as slowly as the number of different words, whereas the number of sequences matched by a combination of two or more UFWs in sequences with multiple domain architectures (MDAs) increased at the same rate as the total number of sequences. This combinatorial arrangement of a limited number of UFWs in MDAs accounts for the genomic diversity of protein sequences. Although eukaryotes and prokaryotes use very similar sets of “words” or UFWs (57% shared), the “sentences” (MDAs) are different (1.3% shared).
Collapse
|
24
|
Validation of LDLr Activity as a Tool to Improve Genetic Diagnosis of Familial Hypercholesterolemia: A Retrospective on Functional Characterization of LDLr Variants. Int J Mol Sci 2018; 19:ijms19061676. [PMID: 29874871 PMCID: PMC6032215 DOI: 10.3390/ijms19061676] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2018] [Revised: 05/28/2018] [Accepted: 06/04/2018] [Indexed: 12/11/2022] Open
Abstract
Familial hypercholesterolemia (FH) is an autosomal dominant disorder characterized by high blood-cholesterol levels mostly caused by mutations in the low-density lipoprotein receptor (LDLr). With a prevalence as high as 1/200 in some populations, genetic screening for pathogenic LDLr mutations is a cost-effective approach in families classified as ‘definite’ or ‘probable’ FH and can help to early diagnosis. However, with over 2000 LDLr variants identified, distinguishing pathogenic mutations from benign mutations is a long-standing challenge in the field. In 1998, the World Health Organization (WHO) highlighted the importance of improving the diagnosis and prognosis of FH patients thus, identifying LDLr pathogenic variants is a longstanding challenge to provide an accurate genetic diagnosis and personalized treatments. In recent years, accessible methodologies have been developed to assess LDLr activity in vitro, providing experimental reproducibility between laboratories all over the world that ensures rigorous analysis of all functional studies. In this review we present a broad spectrum of functionally characterized missense LDLr variants identified in patients with FH, which is mandatory for a definite diagnosis of FH.
Collapse
|
25
|
Ahmad S, Baseer S, Navid A, Ahmad F, Azam SS. An integrated computational hierarchy for identification of potent inhibitors against Shikimate Kinase enzyme from Shigella sonnei , a major cause of global dysentery. GENE REPORTS 2018; 11:283-293. [DOI: 10.1016/j.genrep.2018.04.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
26
|
Xia Y, Fischer AW, Teixeira P, Weiner B, Meiler J. Integrated Structural Biology for α-Helical Membrane Protein Structure Determination. Structure 2018; 26:657-666.e2. [PMID: 29526436 PMCID: PMC5884713 DOI: 10.1016/j.str.2018.02.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 06/14/2017] [Accepted: 02/05/2018] [Indexed: 01/12/2023]
Abstract
While great progress has been made, only 10% of the nearly 1,000 integral, α-helical, multi-span membrane protein families are represented by at least one experimentally determined structure in the PDB. Previously, we developed the algorithm BCL::MP-Fold, which samples the large conformational space of membrane proteins de novo by assembling predicted secondary structure elements guided by knowledge-based potentials. Here, we present a case study of rhodopsin fold determination by integrating sparse and/or low-resolution restraints from multiple experimental techniques including electron microscopy, electron paramagnetic resonance spectroscopy, and nuclear magnetic resonance spectroscopy. Simultaneous incorporation of orthogonal experimental restraints not only significantly improved the sampling accuracy but also allowed identification of the correct fold, which is demonstrated by a protein size-normalized transmembrane root-mean-square deviation as low as 1.2 Å. The protocol developed in this case study can be used for the determination of unknown membrane protein folds when limited experimental restraints are available.
Collapse
Affiliation(s)
- Yan Xia
- Department of Chemistry, Vanderbilt University, Stevenson Center, Station B 351822, Room 7330, Nashville, TN 37232, USA; Center for Structural Biology, Vanderbilt University, Nashville, TN 37232, USA
| | - Axel W Fischer
- Department of Chemistry, Vanderbilt University, Stevenson Center, Station B 351822, Room 7330, Nashville, TN 37232, USA; Center for Structural Biology, Vanderbilt University, Nashville, TN 37232, USA
| | - Pedro Teixeira
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37232, USA
| | - Brian Weiner
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37232, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Stevenson Center, Station B 351822, Room 7330, Nashville, TN 37232, USA; Center for Structural Biology, Vanderbilt University, Nashville, TN 37232, USA.
| |
Collapse
|
27
|
Abstract
Eukaryotic protein kinases (PKs) are a large family of proteins critical for cellular response to external signals, acting as molecular switches. PKs propagate biochemical signals by catalyzing phosphorylation of other proteins, including other PKs, which can undergo conformational changes upon phosphorylation and catalyze further phosphorylations. Although PKs have been studied thoroughly across the domains of life, the structures of these proteins are sparsely understood in numerous groups of organisms, including plants. In addition to efforts towards determining crystal structures of PKs, research on human PKs has incorporated molecular dynamics (MD) simulations to study the conformational dynamics underlying the switching of PK function. This approach of experimental structural biology coupled with computational biophysics has led to improved understanding of how PKs become catalytically active and why mutations cause pathological PK behavior, at spatial and temporal resolutions inaccessible to current experimental methods alone. In this review, we argue for the value of applying MD simulation to plant PKs. We review the basics of MD simulation methodology, the successes achieved through MD simulation in animal PKs, and current work on plant PKs using MD simulation. We conclude with a discussion of the future of MD simulations and plant PKs, arguing for the importance of molecular simulation in the future of plant PK research.
Collapse
|
28
|
Fowler PW, Cole K, Gordon NC, Kearns AM, Llewelyn MJ, Peto TEA, Crook DW, Walker AS. Robust Prediction of Resistance to Trimethoprim in Staphylococcus aureus. Cell Chem Biol 2018; 25:339-349.e4. [PMID: 29307840 DOI: 10.1016/j.chembiol.2017.12.009] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Revised: 10/24/2017] [Accepted: 12/08/2017] [Indexed: 01/28/2023]
Abstract
The rise of antibiotic resistance threatens modern medicine; to combat it new diagnostic methods are required. Sequencing the whole genome of a pathogen offers the potential to accurately determine which antibiotics will be effective to treat a patient. A key limitation of this approach is that it cannot classify rare or previously unseen mutations. Here we demonstrate that alchemical free energy methods, a well-established class of methods from computational chemistry, can successfully predict whether mutations in Staphylococcus aureus dihydrofolate reductase confer resistance to trimethoprim. We also show that the method is quantitatively accurate by calculating how much the most common resistance-conferring mutation, F99Y, reduces the binding free energy of trimethoprim and comparing predicted and experimentally measured minimum inhibitory concentrations for seven different mutations. Finally, by considering up to 32 free energy calculations for each mutation, we estimate its specificity and sensitivity.
Collapse
Affiliation(s)
- Philip W Fowler
- Nuffield Department of Medicine, John Radcliffe Hospital, University of Oxford, Headley Way, Oxford OX3 9DU, UK.
| | - Kevin Cole
- Department of Infectious Diseases and Microbiology, Royal Sussex County Hospital, Brighton, Brighton and Sussex Medical School, Brighton BN1 9PS, UK
| | - N Claire Gordon
- Nuffield Department of Medicine, John Radcliffe Hospital, University of Oxford, Headley Way, Oxford OX3 9DU, UK
| | - Angela M Kearns
- Antimicrobial Resistance and Healthcare Associated Infections Reference Unit, Public Health England, Colindale NW9 5EQ, UK
| | - Martin J Llewelyn
- Department of Infectious Diseases and Microbiology, Royal Sussex County Hospital, Brighton, Brighton and Sussex Medical School, Brighton BN1 9PS, UK
| | - Tim E A Peto
- Nuffield Department of Medicine, John Radcliffe Hospital, University of Oxford, Headley Way, Oxford OX3 9DU, UK
| | - Derrick W Crook
- Nuffield Department of Medicine, John Radcliffe Hospital, University of Oxford, Headley Way, Oxford OX3 9DU, UK
| | - A Sarah Walker
- Nuffield Department of Medicine, John Radcliffe Hospital, University of Oxford, Headley Way, Oxford OX3 9DU, UK
| |
Collapse
|
29
|
Meng F, Wang C, Kurgan L. fDETECT webserver: fast predictor of propensity for protein production, purification, and crystallization. BMC Bioinformatics 2018; 18:580. [PMID: 29295714 PMCID: PMC6389161 DOI: 10.1186/s12859-017-1995-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 12/06/2017] [Indexed: 02/26/2023] Open
Abstract
Background Development of predictors of propensity of protein sequences for successful crystallization has been actively pursued for over a decade. A few novel methods that expanded the scope of these predictions to address additional steps of protein production and structure determination pipelines were released in recent years. The predictive performance of the current methods is modest. This is because the only input that they use is the protein sequence and since the experimental annotations of these data might be inconsistent given that they were collected across many laboratories and centers. However, even these modest levels of predictive quality are still practical compared to the reported low success rates of crystallization, which are below 10%. We focus on another important aspect related to a high computational cost of running the predictors that offer the expanded scope. Results We introduce a novel fDETECT webserver that provides very fast and modestly accurate predictions of the success of protein production, purification, crystallization, and structure determination. Empirical tests on two datasets demonstrate that fDETECT is more accurate than the only other similarly fast method, and similarly accurate and three orders of magnitude faster than the currently most accurate predictors. Our method predicts a single protein in about 120 milliseconds and needs less than an hour to generate the four predictions for an entire human proteome. Moreover, we empirically show that fDETECT secures similar levels of predictive performance when compared with four representative methods that only predict success of crystallization, while it also provides the other three predictions. A webserver that implements fDETECT is available at http://biomine.cs.vcu.edu/servers/fDETECT/. Conclusions fDETECT is a computational tool that supports target selection for protein production and X-ray crystallography-based structure determination. It offers predictive quality that matches or exceeds other state-of-the-art tools and is especially suitable for the analysis of large protein sets.
Collapse
Affiliation(s)
- Fanchi Meng
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada
| | - Chen Wang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
30
|
Dey S, Levy ED. Inferring and Using Protein Quaternary Structure Information from Crystallographic Data. Methods Mol Biol 2018; 1764:357-375. [PMID: 29605927 DOI: 10.1007/978-1-4939-7759-8_23] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A precise knowledge of the quaternary structure of proteins is essential to illuminate both their function and their evolution. The major part of our knowledge on quaternary structure is inferred from X-ray crystallography data, but this inference process is hard and error-prone. The difficulty lies in discriminating fortuitous protein contacts, which make up the lattice of protein crystals, from biological protein contacts that exist in the native cellular environment. Here, we review methods devised to discriminate between both types of contacts and describe resources for downloading protein quaternary structure information and identifying high-confidence quaternary structures. The use of high-confidence datasets of quaternary structures will be critical for the analysis of structural, functional, and evolutionary properties of proteins.
Collapse
Affiliation(s)
- Sucharita Dey
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D Levy
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
31
|
Zhang J, Ma Z, Kurgan L. Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains. Brief Bioinform 2017; 20:1250-1268. [DOI: 10.1093/bib/bbx168] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Revised: 11/15/2017] [Indexed: 11/13/2022] Open
Abstract
Abstract
Proteins interact with a variety of molecules including proteins and nucleic acids. We review a comprehensive collection of over 50 studies that analyze and/or predict these interactions. While majority of these studies address either solely protein–DNA or protein–RNA binding, only a few have a wider scope that covers both protein–protein and protein–nucleic acid binding. Our analysis reveals that binding residues are typically characterized with three hallmarks: relative solvent accessibility (RSA), evolutionary conservation and propensity of amino acids (AAs) for binding. Motivated by drawbacks of the prior studies, we perform a large-scale analysis to quantify and contrast the three hallmarks for residues that bind DNA-, RNA-, protein- and (for the first time) multi-ligand-binding residues that interact with DNA and proteins, and with RNA and proteins. Results generated on a well-annotated data set of over 23 000 proteins show that conservation of binding residues is higher for nucleic acid- than protein-binding residues. Multi-ligand-binding residues are more conserved and have higher RSA than single-ligand-binding residues. We empirically show that each hallmark discriminates between binding and nonbinding residues, even predicted RSA, and that combining them improves discriminatory power for each of the five types of interactions. Linear scoring functions that combine these hallmarks offer good predictive performance of residue-level propensity for binding and provide intuitive interpretation of predictions. Better understanding of these residue-level interactions will facilitate development of methods that accurately predict binding in the exponentially growing databases of protein sequences.
Collapse
|
32
|
Shamsi Z, Moffett AS, Shukla D. Enhanced unbiased sampling of protein dynamics using evolutionary coupling information. Sci Rep 2017; 7:12700. [PMID: 28983093 PMCID: PMC5629199 DOI: 10.1038/s41598-017-12874-7] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Accepted: 09/14/2017] [Indexed: 12/25/2022] Open
Abstract
One of the major challenges in atomistic simulations of proteins is efficient sampling of pathways associated with rare conformational transitions. Recent developments in statistical methods for computation of direct evolutionary couplings between amino acids within and across polypeptide chains have allowed for inference of native residue contacts, informing accurate prediction of protein folds and multimeric structures. In this study, we assess the use of distances between evolutionarily coupled residues as natural choices for reaction coordinates which can be incorporated into Markov state model-based adaptive sampling schemes and potentially used to predict not only functional conformations but also pathways of conformational change, protein folding, and protein-protein association. We demonstrate the utility of evolutionary couplings in sampling and predicting activation pathways of the β 2-adrenergic receptor (β 2-AR), folding of the FiP35 WW domain, and dimerization of the E. coli molybdopterin synthase subunits. We find that the time required for β 2-AR activation and folding of the WW domain are greatly diminished using evolutionary couplings-guided adaptive sampling. Additionally, we were able to identify putative molybdopterin synthase association pathways and near-crystal structure complexes from protein-protein association simulations.
Collapse
Affiliation(s)
- Zahra Shamsi
- Department of Chemical and Biomolecular Engineering, University of Illinois, Urbana, IL, 61801, USA
| | - Alexander S Moffett
- Center for Biophysics and Quantitative Biology, University of Illinois, Urbana, IL, 61801, USA
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois, Urbana, IL, 61801, USA.
- Center for Biophysics and Quantitative Biology, University of Illinois, Urbana, IL, 61801, USA.
- Department of Plant Biology, University of Illinois, Urbana, IL, 61801, USA.
- National Center for Supercomputing Applications, University of Illinois, Urbana, IL, 61801, USA.
| |
Collapse
|
33
|
Monzon AM, Zea DJ, Marino-Buslje C, Parisi G. Homology modeling in a dynamical world. Protein Sci 2017; 26:2195-2206. [PMID: 28815769 DOI: 10.1002/pro.3274] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 08/09/2017] [Accepted: 08/09/2017] [Indexed: 12/31/2022]
Abstract
A key concept in template-based modeling (TBM) is the high correlation between sequence and structural divergence, with the practical consequence that homologous proteins that are similar at the sequence level will also be similar at the structural level. However, conformational diversity of the native state will reduce the correlation between structural and sequence divergence, because structural variation can appear without sequence diversity. In this work, we explore the impact that conformational diversity has on the relationship between structural and sequence divergence. We find that the extent of conformational diversity can be as high as the maximum structural divergence among families. Also, as expected, conformational diversity impairs the well-established correlation between sequence and structural divergence, which is nosier than previously suggested. However, we found that this noise can be resolved using a priori information coming from the structure-function relationship. We show that protein families with low conformational diversity show a well-correlated relationship between sequence and structural divergence, which is severely reduced in proteins with larger conformational diversity. This lack of correlation could impair TBM results in highly dynamical proteins. Finally, we also find that the presence of order/disorder can provide useful beforehand information for better TBM performance.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, B1876BXD, Bernal, Argentina
| | - Diego Javier Zea
- Structural Bioinformatics Unit, Fundación Instituto Leloir, CONICET, C1405BWE Ciudad Autónoma de Buenos Aires, Argentina
| | - Cristina Marino-Buslje
- Structural Bioinformatics Unit, Fundación Instituto Leloir, CONICET, C1405BWE Ciudad Autónoma de Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, B1876BXD, Bernal, Argentina
| |
Collapse
|
34
|
Affiliation(s)
- Xavier Barril
- a Facultat de Farmacia and Institut de Biomedicina (IBUB) , Universitat de Barcelona , Barcelona , Spain.,b Catalan Institution for Research and Advanced Studies (ICREA) , Barcelona , Spain
| |
Collapse
|
35
|
Inhester T, Bietz S, Hilbig M, Schmidt R, Rarey M. Index-Based Searching of Interaction Patterns in Large Collections of Protein–Ligand Interfaces. J Chem Inf Model 2017; 57:148-158. [DOI: 10.1021/acs.jcim.6b00561] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Therese Inhester
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | - Stefan Bietz
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | - Matthias Hilbig
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | - Robert Schmidt
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | - Matthias Rarey
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| |
Collapse
|
36
|
Dybas JM, Fiser A. Development of a motif-based topology-independent structure comparison method to identify evolutionarily related folds. Proteins 2016; 84:1859-1874. [PMID: 27671894 PMCID: PMC5118133 DOI: 10.1002/prot.25169] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Revised: 08/17/2016] [Accepted: 08/25/2016] [Indexed: 11/09/2022]
Abstract
Structure conservation, functional similarities, and homologous relationships that exist across diverse protein topologies suggest that some regions of the protein fold universe are continuous. However, the current structure classification systems are based on hierarchical organizations, which cannot accommodate structural relationships that span fold definitions. Here, we describe a novel, super-secondary-structure motif-based, topology-independent structure comparison method (SmotifCOMP) that is able to quantitatively identify structural relationships between disparate topologies. The basis of SmotifCOMP is a systematically defined super-secondary-structure motif library whose representative geometries are shown to be saturated in the Protein Data Bank and exhibit a unique distribution within the known folds. SmotifCOMP offers a robust and quantitative technique to compare domains that adopt different topologies since the method does not rely on a global superposition. SmotifCOMP is used to perform an exhaustive comparison of the known folds and the identified relationships are used to produce a nonhierarchical representation of the fold space that reflects the notion of a continuous and connected fold universe. The current work offers insight into previously hypothesized evolutionary relationships between disparate folds and provides a resource for exploring novel ones. Proteins 2016; 84:1859-1874. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Joseph M. Dybas
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
| |
Collapse
|
37
|
Identification of potent inhibitors for chromodomain-helicase- DNA-binding protein 1-like through moleculardocking studies. Med Chem Res 2016. [DOI: 10.1007/s00044-016-1712-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
38
|
Gao Y, Hao W, Gu J, Liu D, Fan C, Chen Z, Deng L. PredPhos: an ensemble framework for structure-based prediction of phosphorylation sites. ACTA ACUST UNITED AC 2016; 23:12. [PMID: 27437197 PMCID: PMC4943517 DOI: 10.1186/s40709-016-0042-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Background Post-translational modifications (PTMs) occur on almost all proteins and often strongly affect the functions of modified proteins. Phosphorylation is a crucial PTM mechanism with important regulatory functions in biological systems. Identifying the potential phosphorylation sites of a target protein may increase our understanding of the molecular processes in which it takes part. Results In this paper, we propose PredPhos, a computational method that can accurately predict both kinase-specific and non-kinase-specific phosphorylation sites by using optimally selected properties. The optimal combination of features was selected from a set of 153 novel structural neighborhood properties by a two-step feature selection method consisting of a random forest algorithm and a sequential backward elimination method. To overcome the imbalanced problem, we adopt an ensemble method, which combines bootstrap resampling technique, support vector machine-based fusion classifiers and majority voting strategy. We evaluate the proposed method using both tenfold cross validation and independent test. Results show that our method achieves a significant improvement on the prediction performance for both kinase-specific and non-kinase-specific phosphorylation sites. Conclusions The experimental results demonstrate that the proposed method is quite effective in predicting phosphorylation sites. Promising results are derived from the new structural neighborhood properties, the novel way of feature selection, as well as the ensemble method.
Collapse
Affiliation(s)
- Yong Gao
- School of Software, Central South University, No. 22 Shaoshan South RD., Changsha, 410075 China
| | - Weilin Hao
- School of Software, Central South University, No. 22 Shaoshan South RD., Changsha, 410075 China.,School of Electronics Engineering and Computer Science, Peking University, No. 5 Yiheyuan Road, Beijing, 100871 China
| | - Jing Gu
- School of Software, Central South University, No. 22 Shaoshan South RD., Changsha, 410075 China
| | - Diwei Liu
- School of Software, Central South University, No. 22 Shaoshan South RD., Changsha, 410075 China
| | - Chao Fan
- School of Software, Central South University, No. 22 Shaoshan South RD., Changsha, 410075 China
| | - Zhigang Chen
- School of Software, Central South University, No. 22 Shaoshan South RD., Changsha, 410075 China
| | - Lei Deng
- School of Software, Central South University, No. 22 Shaoshan South RD., Changsha, 410075 China.,Shanghai Key Laboratory of Intelligent Information Processing, No. 220 Handan Road, Shanghai, 200433 China
| |
Collapse
|
39
|
Serrano P, Dutta SK, Proudfoot A, Mohanty B, Susac L, Martin B, Geralt M, Jaroszewski L, Godzik A, Elsliger M, Wilson IA, Wüthrich K. NMR in structural genomics to increase structural coverage of the protein universe: Delivered by Prof. Kurt Wüthrich on 7 July 2013 at the 38th FEBS Congress in St. Petersburg, Russia. FEBS J 2016; 283:3870-3881. [PMID: 27154589 DOI: 10.1111/febs.13751] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Revised: 04/12/2016] [Accepted: 05/04/2016] [Indexed: 12/12/2022]
Abstract
For more than a decade, the Joint Center for Structural Genomics (JCSG; www.jcsg.org) worked toward increased three-dimensional structure coverage of the protein universe. This coordinated quest was one of the main goals of the four high-throughput (HT) structure determination centers of the Protein Structure Initiative (PSI; www.nigms.nih.gov/Research/specificareas/PSI). To achieve the goals of the PSI, the JCSG made use of the complementarity of structure determination by X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy to increase and diversify the range of targets entering the HT structure determination pipeline. The overall strategy, for both techniques, was to determine atomic resolution structures for representatives of large protein families, as defined by the Pfam database, which had no structural coverage and could make significant contributions to biological and biomedical research. Furthermore, the experimental structures could be leveraged by homology modeling to further expand the structural coverage of the protein universe and increase biological insights. Here, we describe what could be achieved by this structural genomics approach, using as an illustration the contributions from 20 NMR structure determinations out of a total of 98 JCSG NMR structures, which were selected because they are the first three-dimensional structure representations of the respective Pfam protein families. The information from this small sample is representative for the overall results from crystal and NMR structure determination in the JCSG. There are five new folds, which were classified as domains of unknown functions (DUF), three of the proteins could be functionally annotated based on three-dimensional structure similarity with previously characterized proteins, and 12 proteins showed only limited similarity with previous deposits in the Protein Data Bank (PDB) and were classified as DUFs.
Collapse
Affiliation(s)
- Pedro Serrano
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Samit K Dutta
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Andrew Proudfoot
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Biswaranjan Mohanty
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.,Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Lukas Susac
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Bryan Martin
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Michael Geralt
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Lukasz Jaroszewski
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Program on Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
| | - Adam Godzik
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Program on Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
| | - Marc Elsliger
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ian A Wilson
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.,Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Kurt Wüthrich
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.,Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
40
|
Hong S, Kim D. Library of binding protein scaffolds (LibBP): a computational platform for selection of binding protein scaffolds. Bioinformatics 2016; 32:1709-15. [DOI: 10.1093/bioinformatics/btw032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Accepted: 01/18/2016] [Indexed: 11/14/2022] Open
|
41
|
Lobb B, Doxey AC. Novel function discovery through sequence and structural data mining. Curr Opin Struct Biol 2016; 38:53-61. [DOI: 10.1016/j.sbi.2016.05.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 01/30/2023]
|
42
|
Margelevičius M. Bayesian nonparametrics in protein remote homology search. Bioinformatics 2016; 32:2744-52. [DOI: 10.1093/bioinformatics/btw213] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2015] [Accepted: 04/14/2016] [Indexed: 11/14/2022] Open
|
43
|
The impact of structural genomics: the first quindecennial. ACTA ACUST UNITED AC 2016; 17:1-16. [PMID: 26935210 DOI: 10.1007/s10969-016-9201-5] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2015] [Accepted: 02/17/2016] [Indexed: 12/21/2022]
Abstract
The period 2000-2015 brought the advent of high-throughput approaches to protein structure determination. With the overall funding on the order of $2 billion (in 2010 dollars), the structural genomics (SG) consortia established worldwide have developed pipelines for target selection, protein production, sample preparation, crystallization, and structure determination by X-ray crystallography and NMR. These efforts resulted in the determination of over 13,500 protein structures, mostly from unique protein families, and increased the structural coverage of the expanding protein universe. SG programs contributed over 4400 publications to the scientific literature. The NIH-funded Protein Structure Initiatives alone have produced over 2000 scientific publications, which to date have attracted more than 93,000 citations. Software and database developments that were necessary to handle high-throughput structure determination workflows have led to structures of better quality and improved integrity of the associated data. Organized and accessible data have a positive impact on the reproducibility of scientific experiments. Most of the experimental data generated by the SG centers are freely available to the community and has been utilized by scientists in various fields of research. SG projects have created, improved, streamlined, and validated many protocols for protein production and crystallization, data collection, and functional analysis, significantly benefiting biological and biomedical research.
Collapse
|
44
|
Sfriso P, Duran-Frigola M, Mosca R, Emperador A, Aloy P, Orozco M. Residues Coevolution Guides the Systematic Identification of Alternative Functional Conformations in Proteins. Structure 2016; 24:116-126. [DOI: 10.1016/j.str.2015.10.025] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Revised: 10/13/2015] [Accepted: 10/17/2015] [Indexed: 12/12/2022]
|
45
|
Fusco D, Charbonneau P. Soft matter perspective on protein crystal assembly. Colloids Surf B Biointerfaces 2016; 137:22-31. [DOI: 10.1016/j.colsurfb.2015.07.023] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Revised: 07/07/2015] [Accepted: 07/09/2015] [Indexed: 01/24/2023]
|
46
|
Abstract
We surveyed the "dark" proteome-that is, regions of proteins never observed by experimental structure determination and inaccessible to homology modeling. For 546,000 Swiss-Prot proteins, we found that 44-54% of the proteome in eukaryotes and viruses was dark, compared with only ∼14% in archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder or transmembrane regions. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. These results suggest new research directions in structural and computational biology.
Collapse
|
47
|
An assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life. Sci Rep 2015; 5:14717. [PMID: 26434770 PMCID: PMC4592975 DOI: 10.1038/srep14717] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 09/07/2015] [Indexed: 11/14/2022] Open
Abstract
Previous studies of protein fold space suggest that fold coverage is plateauing. However, sequence sampling has been -and remains to a large extent- heavily biased, focusing on culturable phyla. Sustained technological developments have fuelled the advent of metagenomics and single-cell sequencing, which might correct the current sequencing bias. The extent to which these efforts affect structural diversity remains unclear, although preliminary results suggest that uncultured organisms could constitute a source of new folds. We investigate to what extent genomes from uncultured and under-sampled phyla accessed through single cell sequencing, metagenomics and high-throughput culturing efforts have the potential to increase protein fold space, and conclude that i) genomes from under-sampled phyla appear enriched in sequences not covered by current protein family and fold profile libraries, ii) this enrichment is linked to an excess of short (and possibly partly spurious) sequences in some of the datasets, iii) the discovery rate of novel folds among sequences uncovered by current fold and family profile libraries may be as high as 36%, but would ultimately translate into a marginal increase in global discovery of novel folds. Thus, genomes from under-sampled phyla should have a rather limited impact on increasing coarse grained tertiary structure level novelty.
Collapse
|
48
|
AlloRep: A Repository of Sequence, Structural and Mutagenesis Data for the LacI/GalR Transcription Regulators. J Mol Biol 2015; 428:671-678. [PMID: 26410588 DOI: 10.1016/j.jmb.2015.09.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 09/04/2015] [Accepted: 09/17/2015] [Indexed: 11/20/2022]
Abstract
Protein families evolve functional variation by accumulating point mutations at functionally important amino acid positions. Homologs in the LacI/GalR family of transcription regulators have evolved to bind diverse DNA sequences and allosteric regulatory molecules. In addition to playing key roles in bacterial metabolism, these proteins have been widely used as a model family for benchmarking structural and functional prediction algorithms. We have collected manually curated sequence alignments for >3000 sequences, in vivo phenotypic and biochemical data for >5750 LacI/GalR mutational variants, and noncovalent residue contact networks for 65 LacI/GalR homolog structures. Using this rich data resource, we compared the noncovalent residue contact networks of the LacI/GalR subfamilies to design and experimentally validate an allosteric mutant of a synthetic LacI/GalR repressor for use in biotechnology. The AlloRep database (freely available at www.AlloRep.org) is a key resource for future evolutionary studies of LacI/GalR homologs and for benchmarking computational predictions of functional change.
Collapse
|
49
|
Computational approaches to study the effects of small genomic variations. J Mol Model 2015; 21:251. [PMID: 26350246 DOI: 10.1007/s00894-015-2794-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Accepted: 08/23/2015] [Indexed: 10/23/2022]
Abstract
Advances in DNA sequencing technologies have led to an avalanche-like increase in the number of gene sequences deposited in public databases over the last decade as well as the detection of an enormous number of previously unseen nucleotide variants therein. Given the size and complex nature of the genome-wide sequence variation data, as well as the rate of data generation, experimental characterization of the disease association of each of these variations or their effects on protein structure/function would be costly, laborious, time-consuming, and essentially impossible. Thus, in silico methods to predict the functional effects of sequence variations are constantly being developed. In this review, we summarize the major computational approaches and tools that are aimed at the prediction of the functional effect of mutations, and describe the state-of-the-art databases that can be used to obtain information about mutation significance. We also discuss future directions in this highly competitive field.
Collapse
|
50
|
Zheng H, Handing KB, Zimmerman MD, Shabalin IG, Almo SC, Minor W. X-ray crystallography over the past decade for novel drug discovery - where are we heading next? Expert Opin Drug Discov 2015; 10:975-89. [PMID: 26177814 DOI: 10.1517/17460441.2015.1061991] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
INTRODUCTION Macromolecular X-ray crystallography has been the primary methodology for determining the three-dimensional structures of proteins, nucleic acids and viruses. Structural information has paved the way for structure-guided drug discovery and laid the foundations for structural bioinformatics. However, X-ray crystallography still has a few fundamental limitations, some of which may be overcome and complemented using emerging methods and technologies in other areas of structural biology. AREAS COVERED This review describes how structural knowledge gained from X-ray crystallography has been used to advance other biophysical methods for structure determination (and vice versa). This article also covers current practices for integrating data generated by other biochemical and biophysical methods with those obtained from X-ray crystallography. Finally, the authors articulate their vision about how a combination of structural and biochemical/biophysical methods may improve our understanding of biological processes and interactions. EXPERT OPINION X-ray crystallography has been, and will continue to serve as, the central source of experimental structural biology data used in the discovery of new drugs. However, other structural biology techniques are useful not only to overcome the major limitation of X-ray crystallography, but also to provide complementary structural data that is useful in drug discovery. The use of recent advancements in biochemical, spectroscopy and bioinformatics methods may revolutionize drug discovery, albeit only when these data are combined and analyzed with effective data management systems. Accurate and complete data management is crucial for developing experimental procedures that are robust and reproducible.
Collapse
Affiliation(s)
- Heping Zheng
- University of Virginia, Department of Molecular Physiology and Biological Physics , 1340 Jefferson Park Avenue, Charlottesville, VA 22908 , USA +1 434 243 6865 ; +1 434 243 2981 ;
| | | | | | | | | | | |
Collapse
|