1
|
Kiouri DP, Batsis GC, Chasapis CT. Structure-Based Approaches for Protein-Protein Interaction Prediction Using Machine Learning and Deep Learning. Biomolecules 2025; 15:141. [PMID: 39858535 PMCID: PMC11763140 DOI: 10.3390/biom15010141] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 01/11/2025] [Accepted: 01/14/2025] [Indexed: 01/27/2025] Open
Abstract
Protein-Protein Interaction (PPI) prediction plays a pivotal role in understanding cellular processes and uncovering molecular mechanisms underlying health and disease. Structure-based PPI prediction has emerged as a robust alternative to sequence-based methods, offering greater biological accuracy by integrating three-dimensional spatial and biochemical features. This work summarizes the recent advances in computational approaches leveraging protein structure information for PPI prediction, focusing on machine learning (ML) and deep learning (DL) techniques. These methods not only improve predictive accuracy but also provide insights into functional sites, such as binding and catalytic residues. However, challenges such as limited high-resolution structural data and the need for effective negative sampling persist. Through the integration of experimental and computational tools, structure-based prediction paves the way for comprehensive proteomic network analysis, holding promise for advancements in drug discovery, biomarker identification, and personalized medicine. Future directions include enhancing scalability and dataset reliability to expand these approaches across diverse proteomes.
Collapse
Affiliation(s)
- Despoina P. Kiouri
- Institute of Chemical Biology, National Hellenic Research Foundation, 11635 Athens, Greece; (D.P.K.); (G.C.B.)
- Laboratory of Organic Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, 15772 Athens, Greece
| | - Georgios C. Batsis
- Institute of Chemical Biology, National Hellenic Research Foundation, 11635 Athens, Greece; (D.P.K.); (G.C.B.)
| | - Christos T. Chasapis
- Institute of Chemical Biology, National Hellenic Research Foundation, 11635 Athens, Greece; (D.P.K.); (G.C.B.)
| |
Collapse
|
2
|
Li S, Wu S, Wang L, Li F, Jiang H, Bai F. Recent advances in predicting protein-protein interactions with the aid of artificial intelligence algorithms. Curr Opin Struct Biol 2022; 73:102344. [PMID: 35219216 DOI: 10.1016/j.sbi.2022.102344] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 01/02/2022] [Accepted: 01/17/2022] [Indexed: 12/15/2022]
Abstract
Protein-protein interactions (PPIs) are essential in the regulation of biological functions and cell events, therefore understanding PPIs have become a key issue to understanding the molecular mechanism and investigating the design of drugs. Here we highlight the major developments in computational methods developed for predicting PPIs by using types of artificial intelligence algorithms. The first part introduces the source of experimental PPI data. The second part is devoted to the PPI prediction methods based on sequential information. The third part covers representative methods using structural information as the input feature. The last part is methods designed by combining different types of features. For each part, the state-of-the-art computational PPI prediction methods are reviewed in an inclusive view. Finally, we discuss the flaws existing in this area and future directions of next-generation algorithms.
Collapse
Affiliation(s)
- Shiwei Li
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Sanan Wu
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Lin Wang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Fenglei Li
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, China; School of Information Science and Technology, ShanghaiTech University, Shanghai, China
| | - Hualiang Jiang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, China; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Pudong, Shanghai, 201203, China
| | - Fang Bai
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, China; School of Information Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
3
|
Shatnawi M. Protein-Protein Interaction Prediction: Recent Advances. 2017 28TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA) 2017:69-73. [DOI: 10.1109/dexa.2017.30] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
4
|
Chen TS, Petrey D, Garzon JI, Honig B. Predicting peptide-mediated interactions on a genome-wide scale. PLoS Comput Biol 2015; 11:e1004248. [PMID: 25938916 PMCID: PMC4418708 DOI: 10.1371/journal.pcbi.1004248] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Accepted: 03/18/2015] [Indexed: 12/20/2022] Open
Abstract
We describe a method to predict protein-protein interactions (PPIs) formed between structured domains and short peptide motifs. We take an integrative approach based on consensus patterns of known motifs in databases, structures of domain-motif complexes from the PDB and various sources of non-structural evidence. We combine this set of clues using a Bayesian classifier that reports the likelihood of an interaction and obtain significantly improved prediction performance when compared to individual sources of evidence and to previously reported algorithms. Our Bayesian approach was integrated into PrePPI, a structure-based PPI prediction method that, so far, has been limited to interactions formed between two structured domains. Around 80,000 new domain-motif mediated interactions were predicted, thus enhancing PrePPI’s coverage of the human protein interactome. Complexes formed between a structured domain on one protein and an unstructured peptide on another are ubiquitous. However, they are often quite difficult to detect experimentally. The development of computational approaches to predict domain-motif interactions is therefore an important goal. We report a method to predict domain-motif interactions using a Bayesian approach to integrate evidence from a variety of sources, including three-dimensional structural and non-structural information. The method was applied to the entire human proteome and showed significant improvement over existing methods. The method was incorporated into PrePPI, a computational pipeline for the prediction of protein-protein interactions that relies heavily on structural information. Approximately 80,000 new interactions were detected. The new PrePPI database provides easy access to about 400,000 human protein-protein interactions and should thus constitute a valuable resource in a variety of biological applications including the characterization of molecular interaction networks and, more generally, in the study of interactions mediated by proteins in families that may not be extensively studied experimentally.
Collapse
Affiliation(s)
- T. Scott Chen
- Howard Hughes Medical Institute, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Donald Petrey
- Howard Hughes Medical Institute, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Jose Ignacio Garzon
- Howard Hughes Medical Institute, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Barry Honig
- Howard Hughes Medical Institute, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
5
|
Shatnawi M. Review of Recent Protein-Protein Interaction Techniques. EMERGING TRENDS IN COMPUTATIONAL BIOLOGY, BIOINFORMATICS, AND SYSTEMS BIOLOGY 2015:99-121. [DOI: 10.1016/b978-0-12-802508-6.00006-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
6
|
Best MD. Global approaches for the elucidation of phosphoinositide-binding proteins. Chem Phys Lipids 2013; 182:19-28. [PMID: 24220499 DOI: 10.1016/j.chemphyslip.2013.10.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 09/13/2013] [Accepted: 10/29/2013] [Indexed: 12/22/2022]
Abstract
Phosphoinositide lipids (PIPns) control numerous critical biological pathways, typically through the regulation of protein function driven by non-covalent protein-lipid binding interactions. Despite the importance of these systems, the unraveling of the full scope of protein-PIPn interactions has represented a significant challenge due to the massive complexity associated with these events, including the large number of diverse proteins that bind to these lipids, variations in the mechanisms by which proteins bind to lipids, and the presence of multiple distinct PIPn isomers. As a result of this complexity, global methods in which numerous proteins that bind PIPns can be identified and characterized simultaneously from complex samples, which have been enabled by key technological advancements, have become popular as an efficient means for tackling this challenge. This review article provides an overview of advancements in large-scale methods for profiling protein-PIPn binding, including experimental methods, such as affinity enrichment, microarray analysis and activity-based protein profiling, as well as computational methods, and combined computational/experimental efforts.
Collapse
Affiliation(s)
- Michael D Best
- Department of Chemistry, The University of Tennessee, 1420 Circle Drive, Knoxville, TN 37996, United States.
| |
Collapse
|
7
|
Zhang QC, Petrey D, Garzón JI, Deng L, Honig B. PrePPI: a structure-informed database of protein-protein interactions. Nucleic Acids Res 2013; 41:D828-33. [PMID: 23193263 PMCID: PMC3531098 DOI: 10.1093/nar/gks1231] [Citation(s) in RCA: 192] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
PrePPI (http://bhapp.c2b2.columbia.edu/PrePPI) is a database that combines predicted and experimentally determined protein-protein interactions (PPIs) using a Bayesian framework. Predicted interactions are assigned probabilities of being correct, which are derived from calculated likelihood ratios (LRs) by combining structural, functional, evolutionary and expression information, with the most important contribution coming from structure. Experimentally determined interactions are compiled from a set of public databases that manually collect PPIs from the literature and are also assigned LRs. A final probability is then assigned to every interaction by combining the LRs for both predicted and experimentally determined interactions. The current version of PrePPI contains ∼2 million PPIs that have a probability more than ∼0.1 of which ∼60 000 PPIs for yeast and ∼370 000 PPIs for human are considered high confidence (probability > 0.5). The PrePPI database constitutes an integrated resource that enables users to examine aggregate information on PPIs, including both known and potentially novel interactions, and that provides structural models for many of the PPIs.
Collapse
Affiliation(s)
- Qiangfeng Cliff Zhang
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - Donald Petrey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - José Ignacio Garzón
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - Lei Deng
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - Barry Honig
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
- *To whom correspondence should be addressed. Tel: +1 212 851 4651; Fax: +1 212 851 4650,
| |
Collapse
|
8
|
Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature 2012; 490:556-60. [PMID: 23023127 PMCID: PMC3482288 DOI: 10.1038/nature11503] [Citation(s) in RCA: 524] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Accepted: 08/10/2012] [Indexed: 12/23/2022]
Abstract
The genome-wide identification of pairs of interacting proteins is an important step in the elucidation of cell regulatory mechanisms1,2. Much of our current knowledge derives from high-throughput techniques such as yeast two hybrid and affinity purification3, as well as from manual curation of experiments on individual systems4. A variety of computational approaches based, for example, on sequence homology, gene co-expression, and phylogenetic profiles have also been developed for the genome-wide inference of protein-protein interactions (PPIs)5,6. Yet, comparative studies suggest that the development of accurate and complete repertoires of PPIs is still in its early stages7–9. Here we show that three-dimensional structural information can be used to predict PPIs with an accuracy and coverage that are superior to predictions based on non-structural evidence. Moreover, an algorithm, PrePPI, that combines structural information with other functional clues is comparable in accuracy to high-throughput experiments, yielding over 30,000 high confidence interactions for yeast and over 300,000 for human. Experimental tests of a number of predictions demonstrate the ability of the PrePPI algorithm to identify unexpected PPIs of significant biological interest. The surprising effectiveness of three-dimensional structural information can be attributed to the use of homology models combined with the exploitation of both close and remote geometric relationships between proteins.
Collapse
|
9
|
Montelione GT. The Protein Structure Initiative: achievements and visions for the future. F1000 BIOLOGY REPORTS 2012; 4:7. [PMID: 22500193 PMCID: PMC3318194 DOI: 10.3410/b4-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The Protein Structure Initiative (PSI) was established in 2000 by the National Institutes of General Medical Sciences with the long-term goal of providing 3D (three-dimensional) structural information for most proteins in nature. As advances in genomic sequencing, bioinformatics, homology modelling, and methods for rapid determination of 3D structures of proteins by X-ray crystallography and nuclear magnetic resonance (NMR) converged, it was proposed that our understanding of the biology of protein structure and evolution could be greatly enabled by ‘genomic-scale’ protein structure determination. Over the past 12 years, the PSI has evolved from a testing bed for new methods of sample and structure production to a core component of a wide range of biology programs.
Collapse
Affiliation(s)
- Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers University Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| |
Collapse
|
10
|
Kuziemko A, Honig B, Petrey D. Using structure to explore the sequence alignment space of remote homologs. PLoS Comput Biol 2011; 7:e1002175. [PMID: 21998567 PMCID: PMC3188491 DOI: 10.1371/journal.pcbi.1002175] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Accepted: 07/14/2011] [Indexed: 11/18/2022] Open
Abstract
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is “optimal” in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are “suboptimal” in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for “modelability”, we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended. It has been suggested that, for nearly every protein sequence, there is already a protein with a similar structure in current protein structure databases. However, with poor or undetectable sequence relationships, it is expected that accurate alignments and models cannot be generated. Here we show that this is not the case, and that whenever structural relationship exists, there are usually local sequence relationships that can be used to generate an accurate alignment, no matter what the global sequence identity. However, this requires an alternative to the traditional dynamic programming algorithm and the consideration of a small ensemble of alignments. We present an algorithm, S4, and demonstrate that it is capable of generating accurate alignments in nearly all cases where a structural relationship exists between two proteins. Our results thus constitute an important advance in the full exploitation of the information in structural databases. That is, the expectation of an accurate alignment suggests that a meaningful model can be generated for nearly every sequence for which a suitable template exists.
Collapse
Affiliation(s)
- Andrew Kuziemko
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Barry Honig
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Donald Petrey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
11
|
Silkov A, Yoon Y, Lee H, Gokhale N, Adu-Gyamfi E, Stahelin RV, Cho W, Murray D. Genome-wide structural analysis reveals novel membrane binding properties of AP180 N-terminal homology (ANTH) domains. J Biol Chem 2011; 286:34155-63. [PMID: 21828048 DOI: 10.1074/jbc.m111.265611] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
An increasing number of cytosolic proteins are shown to interact with membrane lipids during diverse cellular processes, but computational prediction of these proteins and their membrane binding behaviors remains challenging. Here, we introduce a new combinatorial computation protocol for systematic and robust functional prediction of membrane-binding proteins through high throughput homology modeling and in-depth calculation of biophysical properties. The approach was applied to the genomic scale identification of the AP180 N-terminal homology (ANTH) domain, one of the modular lipid binding domains, and prediction of their membrane binding properties. Our analysis yielded comprehensive coverage of the ANTH domain family and allowed classification and functional annotation of proteins based on the differences in local structural and biophysical features. Our analysis also identified a group of plant ANTH domains with unique structural features that may confer novel functionalities. Experimental characterization of a representative member of this subfamily confirmed its unique membrane binding mechanism and unprecedented membrane deforming activity. Collectively, these studies suggest that our new computational approach can be applied to genome-wide functional prediction of other lipid binding domains.
Collapse
Affiliation(s)
- Antonina Silkov
- Department of Pharmacology, Columbia University, New York, New York 11032, USA
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Cai XH, Jaroszewski L, Wooley J, Godzik A. Internal organization of large protein families: relationship between the sequence, structure, and function-based clustering. Proteins 2011; 79:2389-402. [PMID: 21671455 PMCID: PMC3132221 DOI: 10.1002/prot.23049] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2010] [Revised: 02/12/2011] [Accepted: 03/13/2011] [Indexed: 12/14/2022]
Abstract
The protein universe can be organized in families that group proteins sharing common ancestry. Such families display variable levels of structural and functional divergence, from homogenous families, where all members have the same function and very similar structure, to very divergent families, where large variations in function and structure are observed. For practical purposes of structure and function prediction, it would be beneficial to identify sub-groups of proteins with highly similar structures (iso-structural) and/or functions (iso-functional) within divergent protein families. We compared three algorithms in their ability to cluster large protein families and discuss whether any of these methods could reliably identify such iso-structural or iso-functional groups. We show that clustering using profile-sequence and profile-profile comparison methods closely reproduces clusters based on similarities between 3D structures or clusters of proteins with similar biological functions. In contrast, the still commonly used sequence-based methods with fixed thresholds result in vast overestimates of structural and functional diversity in protein families. As a result, these methods also overestimate the number of protein structures that have to be determined to fully characterize structural space of such families. The fact that one can build reliable models based on apparently distantly related templates is crucial for extracting maximal amount of information from new sequencing projects.
Collapse
Affiliation(s)
- Xiao-hui Cai
- Joint Center for Structural Genomics, Bioinformatics Core, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0446, USA
| | - Lukasz Jaroszewski
- Joint Center for Structural Genomics, Bioinformatics Core, Sanford-Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
- Bioinformatics and Systems Biology Program, Sanford-Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | - John Wooley
- Joint Center for Structural Genomics, Bioinformatics Core, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0446, USA
| | - Adam Godzik
- Joint Center for Structural Genomics, Bioinformatics Core, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0446, USA
- Joint Center for Structural Genomics, Bioinformatics Core, Sanford-Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
- Bioinformatics and Systems Biology Program, Sanford-Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
| |
Collapse
|
13
|
Hetényi C, van der Spoel D. Toward prediction of functional protein pockets using blind docking and pocket search algorithms. Protein Sci 2011; 20:880-93. [PMID: 21413095 DOI: 10.1002/pro.618] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2010] [Revised: 03/06/2011] [Accepted: 03/07/2011] [Indexed: 11/09/2022]
Abstract
Location of functional binding pockets of bioactive ligands on protein molecules is essential in structural genomics and drug design projects. If the experimental determination of ligand-protein complex structures is complicated, blind docking (BD) and pocket search (PS) calculations can help in the prediction of atomic resolution binding mode and the location of the pocket of a ligand on the entire protein surface. Whereas the number of successful predictions by these methods is increasing even for the complicated cases of exosites or allosteric binding sites, their reliability has not been fully established. For a critical assessment of reliability, we use a set of ligand-protein complexes, which were found to be problematic in previous studies. The robustness of BD and PS methods is addressed in terms of success of the selection of truly functional pockets from among the many putative ones identified on the surfaces of ligand-bound and ligand-free (holo and apo) protein forms. Issues related to BD such as effect of hydration, existence of multiple pockets, and competition of subsidiary ligands are considered. Practical cases of PS are discussed, categorized and strategies are recommended for handling the different situations. PS can be used in conjunction with BD, as we find that a consensus approach combining the techniques improves predictive power.
Collapse
Affiliation(s)
- Csaba Hetényi
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden.
| | | |
Collapse
|
14
|
Wywial E, Singh SM. Identification and structural characterization of FYVE domain-containing proteins of Arabidopsis thaliana. BMC PLANT BIOLOGY 2010; 10:157. [PMID: 20678208 PMCID: PMC3017826 DOI: 10.1186/1471-2229-10-157] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Accepted: 08/02/2010] [Indexed: 05/02/2023]
Abstract
BACKGROUND FYVE domains have emerged as membrane-targeting domains highly specific for phosphatidylinositol 3-phosphate (PtdIns(3)P). They are predominantly found in proteins involved in various trafficking pathways. Although FYVE domains may function as individual modules, dimers or in partnership with other proteins, structurally, all FYVE domains share a fold comprising two small characteristic double-stranded beta-sheets, and a C-terminal alpha-helix, which houses eight conserved Zn2+ ion-binding cysteines. To date, the structural, biochemical, and biophysical mechanisms for subcellular targeting of FYVE domains for proteins from various model organisms have been worked out but plant FYVE domains remain noticeably under-investigated. RESULTS We carried out an extensive examination of all Arabidopsis FYVE domains, including their identification, classification, molecular modeling and biophysical characterization using computational approaches. Our classification of fifteen Arabidopsis FYVE proteins at the outset reveals unique domain architectures for FYVE containing proteins, which are not paralleled in other organisms. Detailed sequence analysis and biophysical characterization of the structural models are used to predict membrane interaction mechanisms previously described for other FYVE domains and their subtle variations as well as novel mechanisms that seem to be specific to plants. CONCLUSIONS Our study contributes to the understanding of the molecular basis of FYVE-based membrane targeting in plants on a genomic scale. The results show that FYVE domain containing proteins in plants have evolved to incorporate significant differences from those in other organisms implying that they play a unique role in plant signaling pathways and/or play similar/parallel roles in signaling to other organisms but use different protein players/signaling mechanisms.
Collapse
Affiliation(s)
- Ewa Wywial
- Department of Biology, The Graduate Center of the City University of New York, 365 Fifth Avenue, New York, NY 10016, USA
- Department of Biology, Brooklyn College, City University of New York, 2900 Bedford Avenue, Brooklyn, NY 11210, USA
| | - Shaneen M Singh
- Department of Biology, The Graduate Center of the City University of New York, 365 Fifth Avenue, New York, NY 10016, USA
- Department of Biology, Brooklyn College, City University of New York, 2900 Bedford Avenue, Brooklyn, NY 11210, USA
| |
Collapse
|
15
|
Lee H, Li Z, Silkov A, Fischer M, Petrey D, Honig B, Murray D. High-throughput computational structure-based characterization of protein families: START domains and implications for structural genomics. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2010; 11:51-9. [PMID: 20383749 PMCID: PMC2881152 DOI: 10.1007/s10969-010-9086-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 03/05/2010] [Indexed: 11/29/2022]
Abstract
SkyLine, a high-throughput homology modeling pipeline tool, detects and models true sequence homologs to a given protein structure. Structures and models are stored in SkyBase with links to computational function annotation, as calculated by MarkUs. The SkyLine/SkyBase/MarkUs technology represents a novel structure-based approach that is more objective and versatile than other protein classification resources. This structure-centric strategy provides a multi-dimensional organization and coverage of protein space at the levels of family, function, and genome. The concept of "modelability", the ability to model sequences on related structures, provides a reliable criterion for membership in a protein family ("leverage") and underlies the unique success of this approach. The overall procedure is illustrated by its application to START domains, which comprise a Biomedical Theme for the Northeast Structural Genomics Consortium as part of the Protein Structure Initiative. START domains are typically involved in the non-vesicular transport of lipids. While 19 experimentally determined structures are available, the family, whose evolutionary hierarchy is not well determined, is highly sequence diverse, and the ligand-binding potential of many family members is unknown. The SkyLine/SkyBase/MarkUs approach provides significant insights and predicts: (1) many more family members (approximately 4,000) than any other resource; (2) the function for a large number of unannotated proteins; (3) instances of START domains in genomes from which they were thought to be absent; and (4) the existence of two types of novel proteins, those containing dual START domain and those containing N-terminal START domains.
Collapse
Affiliation(s)
- Hunjoong Lee
- Department of Pharmacology, College of Physicians and Surgeons of Columbia University, 630 West 168th St. PH 7W 313, New York, NY 10032
| | - Zhaohui Li
- Department of Pharmacology, College of Physicians and Surgeons of Columbia University, 630 West 168th St. PH 7W 313, New York, NY 10032
| | - Antonina Silkov
- Department of Pharmacology, College of Physicians and Surgeons of Columbia University, 630 West 168th St. PH 7W 313, New York, NY 10032
| | - Markus Fischer
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032
| | - Donald Petrey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032
| | - Barry Honig
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032
| | - Diana Murray
- Department of Pharmacology, College of Physicians and Surgeons of Columbia University, 630 West 168th St. PH 7W 313, New York, NY 10032
| |
Collapse
|
16
|
Mercier KA, Cort JR, Kennedy MA, Lockert EE, Ni S, Shortridge MD, Powers R. Structure and function of Pseudomonas aeruginosa protein PA1324 (21-170). Protein Sci 2009; 18:606-18. [PMID: 19241370 DOI: 10.1002/pro.62] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Pseudomonas aeruginosa is the prototypical biofilm-forming gram-negative opportunistic human pathogen. P. aeruginosa is causatively associated with nosocomial infections and with cystic fibrosis. Antibiotic resistance in some strains adds to the inherent difficulties that result from biofilm formation when treating P. aeruginosa infections. Transcriptional profiling studies suggest widespread changes in the proteome during quorum sensing and biofilm development. Many of the proteins found to be upregulated during these processes are poorly characterized from a functional standpoint. Here, we report the solution NMR structure of PA1324, a protein of unknown function identified in these studies, and provide a putative biological functional assignment based on the observed prealbumin-like fold and FAST-NMR ligand screening studies. PA1324 is postulated to be involved in the binding and transport of sugars or polysaccharides associated with the peptidoglycan matrix during biofilm formation.
Collapse
Affiliation(s)
- Kelly A Mercier
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska 68588, USA
| | | | | | | | | | | | | |
Collapse
|
17
|
Montelione GT, Arrowsmith C, Girvin ME, Kennedy MA, Markley JL, Powers R, Prestegard JH, Szyperski T. Unique opportunities for NMR methods in structural genomics. ACTA ACUST UNITED AC 2009; 10:101-6. [PMID: 19288278 PMCID: PMC2705713 DOI: 10.1007/s10969-009-9064-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2009] [Accepted: 02/25/2009] [Indexed: 11/26/2022]
Abstract
This Perspective, arising from a workshop held in July 2008 in Buffalo NY, provides an overview of the role NMR has played in the United States Protein Structure Initiative (PSI), and a vision of how NMR will contribute to the forthcoming PSI-Biology program. NMR has contributed in key ways to structure production by the PSI, and new methods have been developed which are impacting the broader protein NMR community.
Collapse
Affiliation(s)
- Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, NJ 08854, USA.
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Arnold K, Kiefer F, Kopp J, Battey JND, Podvinec M, Westbrook JD, Berman HM, Bordoli L, Schwede T. The Protein Model Portal. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2009; 10:1-8. [PMID: 19037750 PMCID: PMC2704613 DOI: 10.1007/s10969-008-9048-5] [Citation(s) in RCA: 109] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2008] [Accepted: 11/02/2008] [Indexed: 11/28/2022]
Abstract
Structural Genomics has been successful in determining the structures of many unique proteins in a high throughput manner. Still, the number of known protein sequences is much larger than the number of experimentally solved protein structures. Homology (or comparative) modeling methods make use of experimental protein structures to build models for evolutionary related proteins. Thereby, experimental structure determination efforts and homology modeling complement each other in the exploration of the protein structure space. One of the challenges in using model information effectively has been to access all models available for a specific protein in heterogeneous formats at different sites using various incompatible accession code systems. Often, structure models for hundreds of proteins can be derived from a given experimentally determined structure, using a variety of established methods. This has been done by all of the PSI centers, and by various independent modeling groups. The goal of the Protein Model Portal (PMP) is to provide a single portal which gives access to the various models that can be leveraged from PSI targets and other experimental protein structures. A single interface allows all existing pre-computed models across these various sites to be queried simultaneously, and provides links to interactive services for template selection, target-template alignment, model building, and quality assessment. The current release of the portal consists of 7.6 million model structures provided by different partner resources (CSMP, JCSG, MCSG, NESG, NYSGXRC, JCMM, ModBase, SWISS-MODEL Repository). The PMP is available at http://www.proteinmodelportal.org and from the PSI Structural Genomics Knowledgebase.
Collapse
Affiliation(s)
- Konstantin Arnold
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - Florian Kiefer
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - Jürgen Kopp
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - James N. D. Battey
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - Michael Podvinec
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - John D. Westbrook
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854-8087 USA
| | - Helen M. Berman
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854-8087 USA
| | - Lorenza Bordoli
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| |
Collapse
|
19
|
Hrmova M, Fincher GB. Functional genomics and structural biology in the definition of gene function. Methods Mol Biol 2009; 513:199-227. [PMID: 19347658 DOI: 10.1007/978-1-59745-427-8_11] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
By mid-2007, the three-dimensional (3D) structures of some 45,000 proteins have been solved, over a period where the linear structures of millions of genes have been defined. Technical challenges associated with X-ray crystallography are being overcome and high-throughput methods both for crystallization of proteins and for solving their 3D structures are under development. The question arises as to how structural biology can be integrated with and adds value to functional genomics programs. Structural biology will assist in the definition of gene function through the identification of the likely function of the protein products of genes. The 3D information allows protein sequences predicted from DNA sequences to be classified into broad groups, according to the overall 'fold', or 3D shape, of the protein. Structural information can be used to predict the preferred substrate of a protein, and thereby greatly enhance the accurate annotation of the corresponding gene. Furthermore, it will enable the effects of amino acid substitutions in enzymes to be better understood with respect to enzyme function and could thereby provide insights into natural variation in genes. If the molecular basis of transcription factor-DNA interactions were defined through precise 3D knowledge of the protein-DNA binding site, it would be possible to predict the effects of base substitutions within the motif on the specificity and/or kinetics of binding. In this chapter, we present specific examples of how structural biology can provide valuable information for functional genomics programs.
Collapse
Affiliation(s)
- Maria Hrmova
- Australian Centre for Plant Functional Genomics, School of Agriculture, Food and Wine, University of Adelaide, Waite Campus, Glen Osmond, SA 5064, Australia
| | | |
Collapse
|
20
|
|