1
|
Abstract
For over a century, the Michaelis-Menten (MM) rate law has been used to describe the rates of enzyme-catalyzed reactions and gene expression. Despite the ubiquity of the MM rate law, it accurately captures the dynamics of underlying biochemical reactions only so long as it is applied under the right condition, namely, that the substrate is in large excess over the enzyme-substrate complex. Unfortunately, in circumstances where its validity condition is not satisfied, especially so in protein interaction networks, the MM rate law has frequently been misused. In this review, we illustrate how inappropriate use of the MM rate law distorts the dynamics of the system, provides mistaken estimates of parameter values, and makes false predictions of dynamical features such as ultrasensitivity, bistability, and oscillations. We describe how these problems can be resolved with a slightly modified form of the MM rate law, based on the total quasi-steady state approximation (tQSSA). Furthermore, we show that the tQSSA can be used for accurate stochastic simulations at a lower computational cost than using the full set of mass-action rate laws. This review describes how to use quasi-steady state approximations in the right context, to prevent drawing erroneous conclusions from in silico simulations.
Collapse
Affiliation(s)
- Jae Kyoung Kim
- Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - John J. Tyson
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
- Division of Systems Biology, Virginia Tech, Blacksburg, Virginia, United States of America
| |
Collapse
|
2
|
Xie Z, Deng X, Shu K. Prediction of Protein-Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets. Int J Mol Sci 2020; 21:E467. [PMID: 31940793 PMCID: PMC7013409 DOI: 10.3390/ijms21020467] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 12/23/2019] [Accepted: 01/08/2020] [Indexed: 12/20/2022] Open
Abstract
Protein-protein interaction (PPI) sites play a key role in the formation of protein complexes, which is the basis of a variety of biological processes. Experimental methods to solve PPI sites are expensive and time-consuming, which has led to the development of different kinds of prediction algorithms. We propose a convolutional neural network for PPI site prediction and use residue binding propensity to improve the positive samples. Our method obtains a remarkable result of the area under the curve (AUC) = 0.912 on the improved data set. In addition, it yields much better results on samples with high binding propensity than on randomly selected samples. This suggests that there are considerable false-positive PPI sites in the positive samples defined by the distance between residue atoms.
Collapse
Affiliation(s)
- Zengyan Xie
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
| | | | - Kunxian Shu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
| |
Collapse
|
3
|
Abstract
Essential proteins are critical to the development and survival of cells. Identifying and analyzing essential proteins is vital to understand the molecular mechanisms of living cells and design new drugs. With the development of high-throughput technologies, many protein⁻protein interaction (PPI) data are available, which facilitates the studies of essential proteins at the network level. Up to now, although various computational methods have been proposed, the prediction precision still needs to be improved. In this paper, we propose a novel method by applying Hyperlink-Induced Topic Search (HITS) on weighted PPI networks to detect essential proteins, named HSEP. First, an original undirected PPI network is transformed into a bidirectional PPI network. Then, both biological information and network topological characteristics are taken into account to weighted PPI networks. Pieces of biological information include gene expression data, Gene Ontology (GO) annotation and subcellular localization. The edge clustering coefficient is represented as network topological characteristics to measure the closeness of two connected nodes. We conducted experiments on two species, namely Saccharomyces cerevisiae and Drosophila melanogaster, and the experimental results show that HSEP outperformed some state-of-the-art essential proteins detection techniques.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China.
| | - Siguo Wang
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China.
| | - Fangxiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada.
| |
Collapse
|
4
|
Zhang F, Peng W, Yang Y, Dai W, Song J. A Novel Method for Identifying Essential Genes by Fusing Dynamic Protein⁻Protein Interactive Networks. Genes (Basel) 2019; 10:genes10010031. [PMID: 30626157 PMCID: PMC6356314 DOI: 10.3390/genes10010031] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 12/24/2018] [Accepted: 01/02/2019] [Indexed: 11/16/2022] Open
Abstract
Essential genes play an indispensable role in supporting the life of an organism. Identification of essential genes helps us to understand the underlying mechanism of cell life. The essential genes of bacteria are potential drug targets of some diseases genes. Recently, several computational methods have been proposed to detect essential genes based on the static protein⁻protein interactive (PPI) networks. However, these methods have ignored the fact that essential genes play essential roles under certain conditions. In this work, a novel method was proposed for the identification of essential proteins by fusing the dynamic PPI networks of different time points (called by FDP). Firstly, the active PPI networks of each time point were constructed and then they were fused into a final network according to the networks' similarities. Finally, a novel centrality method was designed to assign each gene in the final network a ranking score, whilst considering its orthologous property and its global and local topological properties in the network. This model was applied on two different yeast data sets. The results showed that the FDP achieved a better performance in essential gene prediction as compared to other existing methods that are based on the static PPI network or that are based on dynamic networks.
Collapse
Affiliation(s)
- Fengyu Zhang
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
- Computer Center of Kunming University of Science and Technology, Kunming 650093, China.
| | - Yunfei Yang
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
| | - Junrong Song
- Faculty of Management and Economics, Kunming University of Science and Technology, Kunming 650093, China.
| |
Collapse
|
5
|
Li H, Peng J, Leung Y, Leung KS, Wong MH, Lu G, Ballester PJ. The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction. Biomolecules 2018. [PMID: 29538331 PMCID: PMC5871981 DOI: 10.3390/biom8010012] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future.
Collapse
Affiliation(s)
- Hongjian Li
- SDIVF R&D Centre, Hong Kong Science Park, Sha Tin, New Territories, Hong Kong, China.
- Institute of Future Cities, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
| | - Jiangjun Peng
- Institute of Future Cities, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Yee Leung
- Institute of Future Cities, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
| | - Kwong-Sak Leung
- Institute of Future Cities, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
| | - Man-Hon Wong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
| | - Gang Lu
- School of Biomedical Sciences, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
| | - Pedro J Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France.
- Institut Paoli-Calmettes, F-13009 Marseille, France.
- Aix-Marseille Université, F-13284 Marseille, France.
- CNRS UMR7258, F-13009 Marseille, France.
| |
Collapse
|
6
|
Meysman P, Titeca K, Eyckerman S, Tavernier J, Goethals B, Martens L, Valkenborg D, Laukens K. Protein complex analysis: From raw protein lists to protein interaction networks. Mass Spectrom Rev 2017; 36:600-614. [PMID: 26709718 DOI: 10.1002/mas.21485] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Accepted: 11/17/2015] [Indexed: 06/05/2023]
Abstract
The elucidation of molecular interaction networks is one of the pivotal challenges in the study of biology. Affinity purification-mass spectrometry and other co-complex methods have become widely employed experimental techniques to identify protein complexes. These techniques typically suffer from a high number of false negatives and false positive contaminants due to technical shortcomings and purification biases. To support a diverse range of experimental designs and approaches, a large number of computational methods have been proposed to filter, infer and validate protein interaction networks from experimental pull-down MS data. Nevertheless, this expansion of available methods complicates the selection of the most optimal ones to support systems biology-driven knowledge extraction. In this review, we give an overview of the most commonly used computational methods to process and interpret co-complex results, and we discuss the issues and unsolved problems that still exist within the field. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:600-614, 2017.
Collapse
Affiliation(s)
- Pieter Meysman
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Kevin Titeca
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Sven Eyckerman
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Jan Tavernier
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Bart Goethals
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Lennart Martens
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Dirk Valkenborg
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- IBioStat, Hasselt University, Hasselt, Belgium
- CFP-CeProMa, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
7
|
Abstract
BACKGROUND Network querying algorithms provide computational means to identify conserved network modules in large-scale biological networks that are similar to known functional modules, such as pathways or molecular complexes. Two main challenges for network querying algorithms are the high computational complexity of detecting potential isomorphism between the query and the target graphs and ensuring the biological significance of the query results. RESULTS In this paper, we propose SEQUOIA, a novel network querying algorithm that effectively addresses these issues by utilizing a context-sensitive random walk (CSRW) model for network comparison and minimizing the network conductance of potential matches in the target network. The CSRW model, inspired by the pair hidden Markov model (pair-HMM) that has been widely used for sequence comparison and alignment, can accurately assess the node-to-node correspondence between different graphs by accounting for node insertions and deletions. The proposed algorithm identifies high-scoring network regions based on the CSRW scores, which are subsequently extended by maximally reducing the network conductance of the identified subnetworks. CONCLUSIONS Performance assessment based on real PPI networks and known molecular complexes show that SEQUOIA outperforms existing methods and clearly enhances the biological significance of the query results. The source code and datasets can be downloaded from http://www.ece.tamu.edu/~bjyoon/SEQUOIA .
Collapse
Affiliation(s)
- Hyundoo Jeong
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.
| |
Collapse
|
8
|
Coskun AF, Cetin AE, Galarreta BC, Alvarez DA, Altug H, Ozcan A. Lensfree optofluidic plasmonic sensor for real-time and label-free monitoring of molecular binding events over a wide field-of-view. Sci Rep 2014; 4:6789. [PMID: 25346102 DOI: 10.1038/lsa.2014.3] [Citation(s) in RCA: 159] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Revised: 08/21/2013] [Accepted: 10/06/2014] [Indexed: 05/28/2023] Open
Abstract
We demonstrate a high-throughput biosensing device that utilizes microfluidics based plasmonic microarrays incorporated with dual-color on-chip imaging toward real-time and label-free monitoring of biomolecular interactions over a wide field-of-view of >20 mm(2). Weighing 40 grams with 8.8 cm in height, this biosensor utilizes an opto-electronic imager chip to record the diffraction patterns of plasmonic nanoapertures embedded within microfluidic channels, enabling real-time analyte exchange. This plasmonic chip is simultaneously illuminated by two different light-emitting-diodes that are spectrally located at the right and left sides of the plasmonic resonance mode, yielding two different diffraction patterns for each nanoaperture array. Refractive index changes of the medium surrounding the near-field of the nanostructures, e.g., due to molecular binding events, induce a frequency shift in the plasmonic modes of the nanoaperture array, causing a signal enhancement in one of the diffraction patterns while suppressing the other. Based on ratiometric analysis of these diffraction images acquired at the detector-array, we demonstrate the proof-of-concept of this biosensor by monitoring in real-time biomolecular interactions of protein A/G with immunoglobulin G (IgG) antibody. For high-throughput on-chip fabrication of these biosensors, we also introduce a deep ultra-violet lithography technique to simultaneously pattern thousands of plasmonic arrays in a cost-effective manner.
Collapse
Affiliation(s)
- Ahmet F Coskun
- 1] Departments of Electrical Engineering and Bioengineering, University of California, Los Angeles (UCLA), CA 90095, USA [2] Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, 91125
| | - Arif E Cetin
- 1] Department of Electrical and Computer Engineering, Boston University, MA 02215, USA [2] Bioengineering Department, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne CH-1015 Switzerland
| | - Betty C Galarreta
- 1] Department of Electrical and Computer Engineering, Boston University, MA 02215, USA [2] Pontificia Universidad Catolica del Peru, Departamento de Ciencias-Quimica, Avenida Universitaria 1801, Lima 32, Peru
| | | | - Hatice Altug
- 1] Department of Electrical and Computer Engineering, Boston University, MA 02215, USA [2] Bioengineering Department, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne CH-1015 Switzerland
| | - Aydogan Ozcan
- 1] Departments of Electrical Engineering and Bioengineering, University of California, Los Angeles (UCLA), CA 90095, USA [2] California NanoSystems Institute, University of California, Los Angeles (UCLA), CA 90095, USA
| |
Collapse
|
9
|
Heath BL, Jockusch RA. Ligand migration in the gaseous insulin-CB7 complex--a cautionary tale about the use of ECD-MS for ligand binding site determination. J Am Soc Mass Spectrom 2012; 23:1911-20. [PMID: 22948902 DOI: 10.1007/s13361-012-0470-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2012] [Revised: 08/03/2012] [Accepted: 08/07/2012] [Indexed: 05/11/2023]
Abstract
Knowledge of the structure of protein-ligand complexes can aid in understanding their roles within complex biological processes. Here we use electrospray ionization (ESI) coupled to a Fourier transform ion cyclotron resonance mass spectrometer to investigate the noncovalent binding of the macrocycle cucurbit[7]uril (CB7) to bovine insulin. Recent condensed-phase experiments (Chinai et al., J. Am. Chem. Soc. 133:8810-8813, 2011) indicate that CB7 binds selectively to the N-terminal phenylalanine of the insulin B-chain. Competition experiments employing ESI mass spectrometry to assess complex formation between CB7 and wild type insulin B-chain vs. a mutant B-chain, confirm that the N-terminal phenylalanine plays in important role in solution-phase binding. However, analysis of fragment ions produced by electron capture dissociation (ECD) of CB7 complexed to intact insulin and to the insulin B-chain suggests a different picture. The apparent gas-phase binding site, as identified by the ECD, lies further along the insulin B-chain. Together, these studies thus indicate that the CB7 ligand migrates in the ESI mass spectrometry analysis. Migration is likely aided by the presence of additional interactions between CB7 and the insulin B-chain, which are not observed in the crystal structure. While this conformational difference may result simply from the removal of solvent and addition of excess protons by the ESI, we propose that the migration may be enhanced by charge reduction during the ECD process itself because ion-dipole interactions are key to CB7 binding. The results of this study caution against using ECD-MS as a stand-alone structural probe for the determination of solution-phase binding sites.
Collapse
Affiliation(s)
- Brittany L Heath
- Department of Chemistry, University of Toronto, Toronto, Ontario, Canada
| | | |
Collapse
|
10
|
Trabuco LG, Betts MJ, Russell RB. Negative protein-protein interaction datasets derived from large-scale two-hybrid experiments. Methods 2012; 58:343-8. [PMID: 22884951 DOI: 10.1016/j.ymeth.2012.07.028] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2012] [Revised: 06/27/2012] [Accepted: 07/28/2012] [Indexed: 01/05/2023] Open
Abstract
Negative protein-protein interaction datasets are needed for training and evaluation of interaction prediction methods, as well as validation of high-throughput interaction discovery experiments. In large-scale two-hybrid assays, the direct interaction of a large number of protein pairs is systematically probed. We present a simple method to harness two-hybrid data to obtain negative protein-protein interaction datasets, which we validated using other available experimental data. The method identifies interactions that were likely tested but not observed in a two-hybrid screen. For each negative interaction, a confidence score is defined as the shortest-path length between the two proteins in the interaction network derived from the two-hybrid experiment. We show that these high-quality negative datasets are particularly important when a specific biological context is considered, such as in the study of protein interaction specificity. We also illustrate the use of a negative dataset in the evaluation of the InterPreTS interaction prediction method.
Collapse
|
11
|
Muley VY, Ranjan A. Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction. PLoS One 2012; 7:e42057. [PMID: 22844541 PMCID: PMC3406042 DOI: 10.1371/journal.pone.0042057] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2011] [Accepted: 07/02/2012] [Indexed: 12/20/2022] Open
Abstract
Background Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. Methods We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Conclusions Higher performance for predicting protein-protein interactions was achievable even with 100–150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50–100 genomes for comparable accuracy of predictions when computational resources are limited.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, Andhra Pradesh, India
- Department of Biotechnology, Dr. Babasaheb Ambedkar Marathwada University, Sub-centre, Osmanabad, Maharashtra, India
| | - Akash Ranjan
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, Andhra Pradesh, India
- * E-mail:
| |
Collapse
|
12
|
Abstract
Motivation: Biological experiments give insight into networks of processes inside a cell, but are subject to error and uncertainty. However, due to the overlap between the large number of experiments reported in public databases it is possible to assess the chances of individual observations being correct. In order to do so, existing methods rely on high-quality ‘gold standard’ reference networks, but such reference networks are not always available. Results: We present a novel algorithm for computing the probability of network interactions that operates without gold standard reference data. We show that our algorithm outperforms existing gold standard-based methods. Finally, we apply the new algorithm to a large collection of genetic interaction and protein–protein interaction experiments. Availability: The integrated dataset and a reference implementation of the algorithm as a plug-in for the Ondex data integration framework are available for download at http://bio-nexus.ncl.ac.uk/projects/nogold/ Contact:darren.wilkinson@ncl.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jochen Weile
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | | | | | | | | | | | | |
Collapse
|
13
|
Abstract
Near-native selections from docking decoys have proved challenging especially when unbound proteins are used in the molecular docking. One reason is that significant atomic clashes in docking decoys lead to poor predictions of binding affinities of near native decoys. Atomic clashes can be removed by structural refinement through energy minimization. Such an energy minimization, however, will lead to an unrealistic bias toward docked structures with large interfaces. Here, we extend an empirical energy function developed for protein design to protein-protein docking selection by introducing a simple reference state that removes the unrealistic dependence of binding affinity of docking decoys on the buried solvent accessible surface area of interface. The energy function called EMPIRE (EMpirical Protein-InteRaction Energy), when coupled with a refinement strategy, is found to provide a significantly improved success rate in near native selections when applied to RosettaDock and refined ZDOCK docking decoys. Our work underlines the importance of removing nonspecific interactions from specific ones in near native selections from docking decoys.
Collapse
Affiliation(s)
- Shide Liang
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, NY 14214, USA
| | | | | | | |
Collapse
|
14
|
Orchard S, Salwinski L, Kerrien S, Montecchi-Palazzi L, Oesterheld M, Stümpflen V, Ceol A, Chatr-aryamontri A, Armstrong J, Woollard P, Salama JJ, Moore S, Wojcik J, Bader GD, Vidal M, Cusick ME, Gerstein M, Gavin AC, Superti-Furga G, Greenblatt J, Bader J, Uetz P, Tyers M, Legrain P, Fields S, Mulder N, Gilson M, Niepmann M, Burgoon L, De Las Rivas J, Prieto C, Perreau VM, Hogue C, Mewes HW, Apweiler R, Xenarios I, Eisenberg D, Cesareni G, Hermjakob H. The minimum information required for reporting a molecular interaction experiment (MIMIx). Nat Biotechnol 2007; 25:894-8. [PMID: 17687370 DOI: 10.1038/nbt1324] [Citation(s) in RCA: 223] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A wealth of molecular interaction data is available in the literature, ranging from large-scale datasets to a single interaction confirmed by several different techniques. These data are all too often reported either as free text or in tables of variable format, and are often missing key pieces of information essential for a full understanding of the experiment. Here we propose MIMIx, the minimum information required for reporting a molecular interaction experiment. Adherence to these reporting guidelines will result in publications of increased clarity and usefulness to the scientific community and will support the rapid, systematic capture of molecular interaction data in public databases, thereby improving access to valuable interaction data.
Collapse
Affiliation(s)
- Sandra Orchard
- European Molecular Biology Laboratory (EMBL) - European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Vacic V, Oldfield CJ, Mohan A, Radivojac P, Cortese MS, Uversky VN, Dunker AK. Characterization of molecular recognition features, MoRFs, and their binding partners. J Proteome Res 2007; 6:2351-66. [PMID: 17488107 PMCID: PMC2570643 DOI: 10.1021/pr0701411] [Citation(s) in RCA: 372] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Molecular Recognition Features (MoRFs) are short, interaction-prone segments of protein disorder that undergo disorder-to-order transitions upon specific binding, representing a specific class of intrinsically disordered regions that exhibit molecular recognition and binding functions. MoRFs are common in various proteomes and occupy a unique structural and functional niche in which function is a direct consequence of intrinsic disorder. Example MoRFs collected from the Protein Data Bank (PDB) have been divided into three subtypes according to their structures in the bound state: alpha-MoRFs form alpha-helices, beta-MoRFs form beta-strands, and iota-MoRFs form structures without a regular pattern of backbone hydrogen bonds. These example MoRFs were indicated to be intrinsically disordered in the absence of their binding partners by several criteria. In this study, we used several geometric and physiochemical criteria to examine the properties of 62 alpha-, 20 beta-, and 176 iota-MoRF complex structures. Interface residues were examined by calculating differences in accessible surface area between the complex and isolated monomers. The compositions and physiochemical properties of MoRF and MoRF partner interface residues were compared to the interface residues of homodimers, heterodimers, and antigen-antibody complexes. Our analysis indicates that there are significant differences in residue composition and several geometric and physicochemical properties that can be used to discriminate, with a high degree of accuracy, between various interfaces in protein interaction data sets. Implications of these findings for the development of MoRF-partner interaction predictors are discussed. In addition, structural changes upon MoRF-to-partner complex formation were examined for several illustrative examples.
Collapse
Affiliation(s)
- Vladimir Vacic
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN
- Computer Science and Engineering Department, University of California, Riverside, CA
| | - Christopher J. Oldfield
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN
- School of Informatics; Indiana University, Bloomington, IN; Indiana University-Purdue University, Indianapolis, IN
| | - Amrita Mohan
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN
- School of Informatics; Indiana University, Bloomington, IN; Indiana University-Purdue University, Indianapolis, IN
| | - Predrag Radivojac
- School of Informatics; Indiana University, Bloomington, IN; Indiana University-Purdue University, Indianapolis, IN
| | - Marc S. Cortese
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN
| | - Vladimir N. Uversky
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN
- Institute for Biological Instrumentation, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia
- CORRESPONDING AUTHOR FOOTNOTE: *Correspondence should be addressed to: Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Health Information and Translational Sciences (HITS), 410 W. 10th Street, HS 5000, Indianapolis, IN 46202. Phone: 317-278-9650; fax: 317-278-9217; E-mail: (V.N.U.) or (A.K.D)
| | - A. Keith Dunker
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN
- CORRESPONDING AUTHOR FOOTNOTE: *Correspondence should be addressed to: Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Health Information and Translational Sciences (HITS), 410 W. 10th Street, HS 5000, Indianapolis, IN 46202. Phone: 317-278-9650; fax: 317-278-9217; E-mail: (V.N.U.) or (A.K.D)
| |
Collapse
|
16
|
Searching high and low for interactions. Nat Methods 2007; 4:377-377. [PMID: 17514787 DOI: 10.1038/nmeth0507-377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
17
|
|
18
|
Abstract
A map of protein-protein interactions provides valuable insight into the cellular function and machinery of a proteome. By measuring the similarity between two Gene Ontology (GO) terms with a relative specificity semantic relation, here, we proposed a new method of reconstructing a yeast protein-protein interaction map that is solely based on the GO annotations. The method was validated using high-quality interaction datasets for its effectiveness. Based on a Z-score analysis, a positive dataset and a negative dataset for protein-protein interactions were derived. Moreover, a gold standard positive (GSP) dataset with the highest level of confidence that covered 78% of the high-quality interaction dataset and a gold standard negative (GSN) dataset with the lowest level of confidence were derived. In addition, we assessed four high-throughput experimental interaction datasets using the positives and the negatives as well as GSPs and GSNs. Our predicted network reconstructed from GSPs consists of 40,753 interactions among 2259 proteins, and forms 16 connected components. We mapped all of the MIPS complexes except for homodimers onto the predicted network. As a result, approximately 35% of complexes were identified interconnected. For seven complexes, we also identified some nonmember proteins that may be functionally related to the complexes concerned. This analysis is expected to provide a new approach for predicting the protein-protein interaction maps from other completely sequenced genomes with high-quality GO-based annotations.
Collapse
Affiliation(s)
| | | | | | | | - Kui Lin
- To whom correspondence should be addressed. Tel: +86 10 58805045; Fax: +86 10 58807721;
| |
Collapse
|
19
|
Abstract
UNLABELLED We present a method for automatic test case generation for protein-protein docking. A consensus-type approach is proposed processing the whole PDB and classifying protein structures into complexes and unbound proteins by combining information from three different approaches (current PDB-at-a-glance classification, search of complexes by sequence identical unbound structures and chain naming). Out of this classification test cases are generated automatically. All calculations were run on the database. The information stored is available via a web interface. The user can choose several criteria for generating his own subset out of our test cases, e.g. for testing docking algorithms. AVAILABILITY http://bibiserv.techfak.uni-bielefeld.de/agt-sdp/ CONTACT fzoellne@techfak.uni-bielefeld.de.
Collapse
Affiliation(s)
- Frank Zöllner
- Applied Computer Science, Faculty of Technology, Bielefeld University D-33594 Bielefeld, Germany.
| | | | | | | |
Collapse
|
20
|
Abstract
UNLABELLED This article reports on PIMWalker, a free and interactive tool for visualising protein interaction networks. PIMWalker handles the unified molecular interaction (MI) format defined by members of the Proteomics Standards Initiative (the PSI MI format), and it is thus directly and easily usable by bench biologists. PIMWalker also comes with a documented, open-source Javatrade mark application programming interface allowing the bioinformatic programmer to easily extend the functions. AVAILABILITY PIMWalker is available under a free license from http://pim.hybrigenics.com/pimwalker.
Collapse
|
21
|
Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C, Roechert B, Poux S, Jung E, Mersch H, Kersey P, Lappe M, Li Y, Zeng R, Rana D, Nikolski M, Husi H, Brun C, Shanker K, Grant SGN, Sander C, Bork P, Zhu W, Pandey A, Brazma A, Jacq B, Vidal M, Sherman D, Legrain P, Cesareni G, Xenarios I, Eisenberg D, Steipe B, Hogue C, Apweiler R. The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data. Nat Biotechnol 2004; 22:177-83. [PMID: 14755292 DOI: 10.1038/nbt926] [Citation(s) in RCA: 469] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across experiments is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community standard data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomolecular Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Reference Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Molecular Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).
Collapse
Affiliation(s)
- Henning Hermjakob
- European Bioinformatics Institute, EBI-Hinxton, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Suzuki H, Saito R, Kanamori M, Kai C, Schönbach C, Nagashima T, Hosaka J, Hayashizaki Y. The mammalian protein-protein interaction database and its viewing system that is linked to the main FANTOM2 viewer. Genome Res 2003; 13:1534-41. [PMID: 12819152 PMCID: PMC403706 DOI: 10.1101/gr.956303] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Here, we describe the development of a mammalian protein-protein interaction (PPI) database and of a PPI Viewer application to display protein interaction networks (http://fantom21.gsc.riken.go.jp/PPI/). In the database, we stored the mammalian PPIs identified through our PPI assays (internal PPIs), as well as those we extracted and processed (external PPIs) from publicly available data sources, the DIP and BIND databases and MEDLINE abstracts by using FACTS, a new functional inference and curation system. We integrated the internal and external PPIs into the PPI database, which is linked to the main FANTOM2 viewer. In addition, we incorporated into the PPI Viewer information regarding the luciferase reporter activity of internal PPIs and the data confidence of external PPIs; these data enable visualization and evaluation of the reliability of each interaction. Using the described system, we successfully identified several interactions of biological significance. Therefore, the PPI Viewer is a useful tool for exploring FANTOM2 clone-related protein interactions and their potential effects on signaling and cellular communication.
Collapse
Affiliation(s)
- Harukazu Suzuki
- Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | | | | | | | | | | | | | | |
Collapse
|