1
|
Grassmann G, Miotto M, Desantis F, Di Rienzo L, Tartaglia GG, Pastore A, Ruocco G, Monti M, Milanetti E. Computational Approaches to Predict Protein-Protein Interactions in Crowded Cellular Environments. Chem Rev 2024; 124:3932-3977. [PMID: 38535831 PMCID: PMC11009965 DOI: 10.1021/acs.chemrev.3c00550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/20/2024] [Accepted: 02/21/2024] [Indexed: 04/11/2024]
Abstract
Investigating protein-protein interactions is crucial for understanding cellular biological processes because proteins often function within molecular complexes rather than in isolation. While experimental and computational methods have provided valuable insights into these interactions, they often overlook a critical factor: the crowded cellular environment. This environment significantly impacts protein behavior, including structural stability, diffusion, and ultimately the nature of binding. In this review, we discuss theoretical and computational approaches that allow the modeling of biological systems to guide and complement experiments and can thus significantly advance the investigation, and possibly the predictions, of protein-protein interactions in the crowded environment of cell cytoplasm. We explore topics such as statistical mechanics for lattice simulations, hydrodynamic interactions, diffusion processes in high-viscosity environments, and several methods based on molecular dynamics simulations. By synergistically leveraging methods from biophysics and computational biology, we review the state of the art of computational methods to study the impact of molecular crowding on protein-protein interactions and discuss its potential revolutionizing effects on the characterization of the human interactome.
Collapse
Affiliation(s)
- Greta Grassmann
- Department
of Biochemical Sciences “Alessandro Rossi Fanelli”, Sapienza University of Rome, Rome 00185, Italy
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Mattia Miotto
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Fausta Desantis
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- The
Open University Affiliated Research Centre at Istituto Italiano di
Tecnologia, Genoa 16163, Italy
| | - Lorenzo Di Rienzo
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Gian Gaetano Tartaglia
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
- Center
for Human Technologies, Genoa 16152, Italy
| | - Annalisa Pastore
- Experiment
Division, European Synchrotron Radiation
Facility, Grenoble 38043, France
| | - Giancarlo Ruocco
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| | - Michele Monti
- RNA
System Biology Lab, Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
| | - Edoardo Milanetti
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| |
Collapse
|
2
|
Albu AI, Bocicor MI, Czibula G. MM-StackEns: A new deep multimodal stacked generalization approach for protein-protein interaction prediction. Comput Biol Med 2023; 153:106526. [PMID: 36623437 DOI: 10.1016/j.compbiomed.2022.106526] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 12/13/2022] [Accepted: 12/31/2022] [Indexed: 01/05/2023]
Abstract
Accurate in-silico identification of protein-protein interactions (PPIs) is a long-standing problem in biology, with important implications in protein function prediction and drug design. Current computational approaches predominantly use a single data modality for describing protein pairs, which may not fully capture the characteristics relevant for identifying PPIs. Another limitation of existing methods is their poor generalization to proteins outside the training graph. In this paper, we aim to address these shortcomings by proposing a new ensemble approach for PPI prediction, which learns information from two modalities, corresponding to pairs of sequences and to the graph formed by the training proteins and their interactions. Our approach uses a siamese neural network to process sequence information, while graph attention networks are employed for the network view. For capturing the relationships between the proteins in a pair, we design a new feature fusion module, based on computing the distance between the distributions corresponding to the two proteins. The prediction is made using a stacked generalization procedure, in which the final classifier is represented by a Logistic Regression model trained on the scores predicted by the sequence and graph models. Additionally, we show that protein sequence embeddings obtained using pretrained language models can significantly improve the generalization of PPI methods. The experimental results demonstrate the good performance of our approach, which surpasses all the related work on two Yeast data sets, while outperforming the majority of literature approaches on two Human data sets and on independent multi-species data sets.
Collapse
Affiliation(s)
- Alexandra-Ioana Albu
- Department of Computer Science, Babeş-Bolyai University, 1 Mihail Kogalniceanu Street, Cluj-Napoca, 400084, Romania.
| | - Maria-Iuliana Bocicor
- Department of Computer Science, Babeş-Bolyai University, 1 Mihail Kogalniceanu Street, Cluj-Napoca, 400084, Romania.
| | - Gabriela Czibula
- Department of Computer Science, Babeş-Bolyai University, 1 Mihail Kogalniceanu Street, Cluj-Napoca, 400084, Romania.
| |
Collapse
|
3
|
Kimothi D, Biyani P, Hogan JM, Davis MJ. Sequence Representations and Their Utility for Predicting Protein-Protein Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:646-657. [PMID: 34941517 DOI: 10.1109/tcbb.2021.3137325] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Protein-Protein Interactions (PPIs) are a crucial mechanism underpinning the function of the cell. So far, a wide range of machine-learning based methods have been proposed for predicting these relationships. Their success is heavily dependent on the construction of the underlying feature vectors, with most using a set of physico-chemical properties derived from the sequence. Few work directly with the sequence itself. In this paper, we explore the utility of sequence embeddings for predicting protein-protein interactions. We construct a protein pair feature vector by concatenating the embeddings of their constituent sequence. These feature vectors are then used as input to a binary classifier to make predictions. To learn sequence embeddings, we use two established Word2Vec based methods - Seq2Vec and BioVec - and we also introduce a novel feature construction method called SuperVecNW. The embeddings generated through SuperVecNW capture some network information in addition to the contextual information present in the sequences. We test the efficacy of our proposed approach on human and yeast PPI datasets and on three well-known networks: CD9, the Ras-Raf-Mek-Erk-Elk-Srf pathway, and a Wnt-related network. We demonstrate that low dimensional sequence embeddings provide better results than most alternative representations based on physico-chemical properties while offering a far simple approach to feature vector construction.
Collapse
|
4
|
Hajikarimlou M, Hooshyar M, Moutaoufik M, Aly K, Azad T, Takallou S, Jagadeesan S, Phanse S, Said K, Samanfar B, Bell J, Dehne F, Babu M, Golshani A. A computational approach to rapidly design peptides that detect SARS-CoV-2 surface protein S. NAR Genom Bioinform 2022; 4:lqac058. [PMID: 36004308 PMCID: PMC9394169 DOI: 10.1093/nargab/lqac058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 06/10/2022] [Accepted: 08/01/2022] [Indexed: 11/12/2022] Open
Abstract
Abstract
The coronavirus disease 19 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) prompted the development of diagnostic and therapeutic frameworks for timely containment of this pandemic. Here, we utilized our non-conventional computational algorithm, InSiPS, to rapidly design and experimentally validate peptides that bind to SARS-CoV-2 spike (S) surface protein. We previously showed that this method can be used to develop peptides against yeast proteins, however, the applicability of this method to design peptides against other proteins has not been investigated. In the current study, we demonstrate that two sets of peptides developed using InSiPS method can detect purified SARS-CoV-2 S protein via ELISA and Surface Plasmon Resonance (SPR) approaches, suggesting the utility of our strategy in real time COVID-19 diagnostics. Mass spectrometry-based salivary peptidomics shortlist top SARS-CoV-2 peptides detected in COVID-19 patients’ saliva, rendering them attractive SARS-CoV-2 diagnostic targets that, when subjected to our computational platform, can streamline the development of potent peptide diagnostics of SARS-CoV-2 variants of concern. Our approach can be rapidly implicated in diagnosing other communicable diseases of immediate threat.
Collapse
Affiliation(s)
- Maryam Hajikarimlou
- Ottawa Institute of Systems Biology, University of Ottawa , Health Science Campus, Ottawa , Ontario , Canada
- Department of Biology, Carleton University , Ottawa , Ontario , Canada
| | - Mohsen Hooshyar
- Ottawa Institute of Systems Biology, University of Ottawa , Health Science Campus, Ottawa , Ontario , Canada
- Department of Biology, Carleton University , Ottawa , Ontario , Canada
| | - Mohamed Taha Moutaoufik
- Department of Biochemistry, Research and Innovation Centre, University of Regina , Regina , Canada
| | - Khaled A Aly
- Department of Biochemistry, Research and Innovation Centre, University of Regina , Regina , Canada
| | - Taha Azad
- The Ottawa Hospital Research Institute 501 Smyth Road , Ottawa , Ontario , Canada
| | - Sarah Takallou
- Ottawa Institute of Systems Biology, University of Ottawa , Health Science Campus, Ottawa , Ontario , Canada
- Department of Biology, Carleton University , Ottawa , Ontario , Canada
| | - Sasi Jagadeesan
- Ottawa Institute of Systems Biology, University of Ottawa , Health Science Campus, Ottawa , Ontario , Canada
- Department of Biology, Carleton University , Ottawa , Ontario , Canada
| | - Sadhna Phanse
- Department of Biochemistry, Research and Innovation Centre, University of Regina , Regina , Canada
| | - Kamaledin B Said
- Department of Biology, Carleton University , Ottawa , Ontario , Canada
- Department of Pathology and Microbiology, College of Medicine, University of Hail , Saudi Arabia
| | - Bahram Samanfar
- Ottawa Institute of Systems Biology, University of Ottawa , Health Science Campus, Ottawa , Ontario , Canada
- Department of Biology, Carleton University , Ottawa , Ontario , Canada
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre (ORDC) , Ottawa , Ontario , Canada
| | - John C Bell
- The Ottawa Hospital Research Institute 501 Smyth Road , Ottawa , Ontario , Canada
| | - Frank Dehne
- School of Computer Science, Carleton University , Ottawa , Ontario , Canada
| | - Mohan Babu
- Department of Biochemistry, Research and Innovation Centre, University of Regina , Regina , Canada
| | - Ashkan Golshani
- Ottawa Institute of Systems Biology, University of Ottawa , Health Science Campus, Ottawa , Ontario , Canada
- Department of Biology, Carleton University , Ottawa , Ontario , Canada
| |
Collapse
|
5
|
Halder AK, Bandyopadhyay SS, Chatterjee P, Nasipuri M, Plewczynski D, Basu S. JUPPI: A Multi-Level Feature Based Method for PPI Prediction and a Refined Strategy for Performance Assessment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:531-542. [PMID: 32750875 DOI: 10.1109/tcbb.2020.3004970] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Over the years, several methods have been proposed for the computational PPI prediction with different performance evaluation strategies. While attempting to benchmark performance scores, most of these methods often suffer with ill-treated cross-validation strategies, adhoc selection of positive/negative samples etc. To address these issues, in our proposed multi-level feature based PPI prediction approach (JUPPI), using sequence, domain and GO information as features, a refined evaluation strategy has been introduced. During the evaluation process, we first extract high quality negative data using three-stage filtering, and then introduce a pair-input based cross validation strategy with three difficulty levels for test-set predictions. Our proposed evaluation strategy reduces the component-level overlapping issue in test sets. Performance of JUPPI is compared with those of the state-of-the-art approaches in this domain and tested on six independent PPI datasets. In almost all the datasets, JUPPI outperforms the state-of-the-art not only at human proteome level for PPI prediction, but also for prediction of interactors for intrinsic disordered human proteins. https://figshare.com/projects/JUPPI_A_Multi-level_Feature_Based_Method_for_PPI_Prediction_and_a_Refined_Strategy_for_Performance_Assessment/81656 JUPPI tool and the developed datasets (JUPPId) are available in public domain for academic use along with supplementary materials, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2020.3004970.
Collapse
|
6
|
Dick K, Pattang A, Hooker J, Nissan N, Sadowski M, Barnes B, Tan LH, Burnside D, Phanse S, Aoki H, Babu M, Dehne F, Golshani A, Cober ER, Green JR, Samanfar B. Human-Soybean Allergies: Elucidation of the Seed Proteome and Comprehensive Protein-Protein Interaction Prediction. J Proteome Res 2021; 20:4925-4947. [PMID: 34582199 DOI: 10.1021/acs.jproteome.1c00138] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The soybean crop, Glycine max (L.) Merr., is consumed by humans, Homo sapiens, worldwide. While the respective bodies of literature and -omics data for each of these organisms are extensive, comparatively few studies investigate the molecular biological processes occurring between the two. We are interested in elucidating the network of protein-protein interactions (PPIs) involved in human-soybean allergies. To this end, we leverage state-of-the-art sequence-based PPI predictors amenable to predicting the enormous comprehensive interactome between human and soybean. A network-based analytical approach is proposed, leveraging similar interaction profiles to identify candidate allergens and proteins involved in the allergy response. Interestingly, the predicted interactome can be explored from two complementary perspectives: which soybean proteins are predicted to interact with specific human proteins and which human proteins are predicted to interact with specific soybean proteins. A total of eight proteins (six specific to the human proteome and two to the soy proteome) have been identified and supported by the literature to be involved in human health, specifically related to immunological and neurological pathways. This study, beyond generating the most comprehensive human-soybean interactome to date, elucidated a soybean seed interactome and identified several proteins putatively consequential to human health.
Collapse
Affiliation(s)
- Kevin Dick
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada K1S 5B6
| | - Arezo Pattang
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, Ontario, Canada K1A 0C6
- Department of Biology and Institute of Biochemistry, and Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada K1S 5B6
| | - Julia Hooker
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, Ontario, Canada K1A 0C6
- Department of Biology and Institute of Biochemistry, and Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada K1S 5B6
| | - Nour Nissan
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, Ontario, Canada K1A 0C6
- Department of Biology and Institute of Biochemistry, and Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada K1S 5B6
| | - Michael Sadowski
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, Ontario, Canada K1A 0C6
- Department of Biology and Institute of Biochemistry, and Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada K1S 5B6
| | - Bradley Barnes
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada K1S 5B6
| | - Le Hoa Tan
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, Ontario, Canada K1A 0C6
- Department of Biology and Institute of Biochemistry, and Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada K1S 5B6
| | - Daniel Burnside
- Department of Biology and Institute of Biochemistry, and Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada K1S 5B6
| | - Sadhna Phanse
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, Canada S4S 0A2
| | - Hiroyuki Aoki
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, Canada S4S 0A2
| | - Mohan Babu
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, Canada S4S 0A2
| | - Frank Dehne
- School of Computer Science, Carleton University, Ottawa, Ontario, Canada K1S 5B6
| | - Ashkan Golshani
- Department of Biology and Institute of Biochemistry, and Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada K1S 5B6
| | - Elroy R Cober
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, Ontario, Canada K1A 0C6
| | - James R Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada K1S 5B6
| | - Bahram Samanfar
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, Ontario, Canada K1A 0C6
- Department of Biology and Institute of Biochemistry, and Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada K1S 5B6
| |
Collapse
|
7
|
Annie Lee ES, Zhou P, Wong AKC. WeMine Aligned Pattern Clustering System for Biosequence Pattern Analysis. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
8
|
Dick K, Samanfar B, Barnes B, Cober ER, Mimee B, Tan LH, Molnar SJ, Biggar KK, Golshani A, Dehne F, Green JR. PIPE4: Fast PPI Predictor for Comprehensive Inter- and Cross-Species Interactomes. Sci Rep 2020; 10:1390. [PMID: 31996697 PMCID: PMC6989690 DOI: 10.1038/s41598-019-56895-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 12/13/2019] [Indexed: 02/06/2023] Open
Abstract
The need for larger-scale and increasingly complex protein-protein interaction (PPI) prediction tasks demands that state-of-the-art predictors be highly efficient and adapted to inter- and cross-species predictions. Furthermore, the ability to generate comprehensive interactomes has enabled the appraisal of each PPI in the context of all predictions leading to further improvements in classification performance in the face of extreme class imbalance using the Reciprocal Perspective (RP) framework. We here describe the PIPE4 algorithm. Adaptation of the PIPE3/MP-PIPE sequence preprocessing step led to upwards of 50x speedup and the new Similarity Weighted Score appropriately normalizes for window frequency when applied to any inter- and cross-species prediction schemas. Comprehensive interactomes for three prediction schemas are generated: (1) cross-species predictions, where Arabidopsis thaliana is used as a proxy to predict the comprehensive Glycine max interactome, (2) inter-species predictions between Homo sapiens-HIV1, and (3) a combined schema involving both cross- and inter-species predictions, where both Arabidopsis thaliana and Caenorhabditis elegans are used as proxy species to predict the interactome between Glycine max (the soybean legume) and Heterodera glycines (the soybean cyst nematode). Comparing PIPE4 with the state-of-the-art resulted in improved performance, indicative that it should be the method of choice for complex PPI prediction schemas.
Collapse
Affiliation(s)
- Kevin Dick
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Ontario, K1S 5B6, Canada
| | - Bahram Samanfar
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, Ontario, K1A 0C6, Canada
- Department of Biology, Carleton University, Ottawa, K1S 5B6, Ontario, Canada
| | - Bradley Barnes
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Ontario, K1S 5B6, Canada
| | - Elroy R Cober
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, Ontario, K1A 0C6, Canada
| | - Benjamin Mimee
- Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu Research and Development Centre, Saint-Jean-sur-Richelieu, J3B 3E6, Quebec, Canada
| | - Le Hoa Tan
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, Ontario, K1A 0C6, Canada
| | - Stephen J Molnar
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, Ontario, K1A 0C6, Canada
| | - Kyle K Biggar
- Department of Biology, Carleton University, Ottawa, K1S 5B6, Ontario, Canada
| | - Ashkan Golshani
- Department of Biology, Carleton University, Ottawa, K1S 5B6, Ontario, Canada
- Ottawa Institute of Systems Biology, Carleton University, 1125 Colonel By Drive, Ottawa, K1S 5B6, Canada
| | - Frank Dehne
- School of Computer Science, Carleton University, Ottawa, Ontario, K1S 5B6, Canada
| | - James R Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Ontario, K1S 5B6, Canada.
| |
Collapse
|
9
|
Tiwari S, Dwivedi UN. Discovering Innovative Drugs Targeting Both Cancer and Cardiovascular Disease by Shared Protein-Protein Interaction Network Analyses. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2019; 23:417-425. [PMID: 31329050 DOI: 10.1089/omi.2019.0095] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Cancer and cardiovascular disease (CVD) have a common co-occurrence. Both diseases display overlapping pathophysiology and risk factors, suggesting shared biological mechanisms. Conditions such as obesity, diabetes, hypertension, smoking, poor diet, and inadequate physical activity can cause both heart disease and cancer. The burgeoning field of onco-cardiology aims to develop diagnostics and innovative therapeutics for both diseases through targeting shared mechanisms and molecular targets. In this overarching context, this expert review presents an analysis of the protein-protein interaction (PPI) networks for onco-cardiology drug discovery. Several PPI complexes such as MDM2-TP53 and CDK4-pRB have been studied for their tumor-suppressive functions. In addition, XIAP-SMAC, RAC1-GEF, Sur-2ESX, and TP53-BRCA1 are other PPI complexes that offer potential breakthrough for onco-cardiology therapeutics innovation. As both cancer and CVD share biological mechanisms to a certain degree, the PPI network analyses for onco-cardiology drug discovery are promising for addressing comorbid diseases in the spirit of systems medicine. We discuss the emerging architecture of PPI networks in cancer and CVD and prospects and challenges for their exploitation toward therapeutics applications. Finally, we emphasize that PPIs that were once thought to be undruggable have become potential new class of innovative drug targets.
Collapse
Affiliation(s)
- Sameeksha Tiwari
- Bioinformatics Infrastructure Facility, Department of Biochemistry, Centre of Excellence in Bioinformatics, University of Lucknow, Lucknow, Uttar Pradesh, India
| | - Upendra N Dwivedi
- Bioinformatics Infrastructure Facility, Department of Biochemistry, Centre of Excellence in Bioinformatics, University of Lucknow, Lucknow, Uttar Pradesh, India.,Institute for Development of Advanced Computing, ONGC Centre for Advanced Studies, University of Lucknow, Lucknow, Uttar Pradesh, India
| |
Collapse
|
10
|
Insights into the suitability of utilizing brown rats (Rattus norvegicus) as a model for healing spinal cord injury with epidermal growth factor and fibroblast growth factor-II by predicting protein-protein interactions. Comput Biol Med 2019; 104:220-226. [DOI: 10.1016/j.compbiomed.2018.11.026] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 11/28/2018] [Accepted: 11/29/2018] [Indexed: 01/06/2023]
|
11
|
Reciprocal Perspective for Improved Protein-Protein Interaction Prediction. Sci Rep 2018; 8:11694. [PMID: 30076341 PMCID: PMC6076239 DOI: 10.1038/s41598-018-30044-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 07/20/2018] [Indexed: 02/06/2023] Open
Abstract
All protein-protein interaction (PPI) predictors require the determination of an operational decision threshold when differentiating positive PPIs from negatives. Historically, a single global threshold, typically optimized via cross-validation testing, is applied to all protein pairs. However, we here use data visualization techniques to show that no single decision threshold is suitable for all protein pairs, given the inherent diversity of protein interaction profiles. The recent development of high throughput PPI predictors has enabled the comprehensive scoring of all possible protein-protein pairs. This, in turn, has given rise to context, enabling us now to evaluate a PPI within the context of all possible predictions. Leveraging this context, we introduce a novel modeling framework called Reciprocal Perspective (RP), which estimates a localized threshold on a per-protein basis using several rank order metrics. By considering a putative PPI from the perspective of each of the proteins within the pair, RP rescores the predicted PPI and applies a cascaded Random Forest classifier leading to improvements in recall and precision. We here validate RP using two state-of-the-art PPI predictors, the Protein-protein Interaction Prediction Engine and the Scoring PRotein INTeractions methods, over five organisms: Homo sapiens, Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, and Mus musculus. Results demonstrate the application of a post hoc RP rescoring layer significantly improves classification (p < 0.001) in all cases over all organisms and this new rescoring approach can apply to any PPI prediction method.
Collapse
|
12
|
Omidi K, Jessulat M, Hooshyar M, Burnside D, Schoenrock A, Kazmirchuk T, Hajikarimlou M, Daniel M, Moteshareie H, Bhojoo U, Sanders M, Ramotar D, Dehne F, Samanfar B, Babu M, Golshani A. Uncharacterized ORF HUR1 influences the efficiency of non-homologous end-joining repair in Saccharomyces cerevisiae. Gene 2018; 639:128-136. [DOI: 10.1016/j.gene.2017.10.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Revised: 06/25/2017] [Accepted: 10/02/2017] [Indexed: 01/05/2023]
|
13
|
Kazmirchuk T, Dick K, Burnside DJ, Barnes B, Moteshareie H, Hajikarimlou M, Omidi K, Ahmed D, Low A, Lettl C, Hooshyar M, Schoenrock A, Pitre S, Babu M, Cassol E, Samanfar B, Wong A, Dehne F, Green JR, Golshani A. Designing anti-Zika virus peptides derived from predicted human-Zika virus protein-protein interactions. Comput Biol Chem 2017; 71:180-187. [DOI: 10.1016/j.compbiolchem.2017.10.011] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 10/03/2017] [Accepted: 10/27/2017] [Indexed: 01/22/2023]
|
14
|
Li Y, Ilie L. SPRINT: ultrafast protein-protein interaction prediction of the entire human interactome. BMC Bioinformatics 2017; 18:485. [PMID: 29141584 PMCID: PMC5688644 DOI: 10.1186/s12859-017-1871-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Accepted: 10/17/2017] [Indexed: 12/30/2022] Open
Abstract
Background Proteins perform their functions usually by interacting with other proteins. Predicting which proteins interact is a fundamental problem. Experimental methods are slow, expensive, and have a high rate of error. Many computational methods have been proposed among which sequence-based ones are very promising. However, so far no such method is able to predict effectively the entire human interactome: they require too much time or memory. Results We present SPRINT (Scoring PRotein INTeractions), a new sequence-based algorithm and tool for predicting protein-protein interactions. We comprehensively compare SPRINT with state-of-the-art programs on seven most reliable human PPI datasets and show that it is more accurate while running orders of magnitude faster and using very little memory. Conclusion SPRINT is the only sequence-based program that can effectively predict the entire human interactome: it requires between 15 and 100 min, depending on the dataset. Our goal is to transform the very challenging problem of predicting the entire human interactome into a routine task. Availability The source code of SPRINT is freely available from https://github.com/lucian-ilie/SPRINT/
and the datasets and predicted PPIs from www.csd.uwo.ca/faculty/ilie/SPRINT/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1871-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yiwei Li
- Department of Computer Science, The University of Western Ontario, London, N6A 5B7, Ontario, Canada
| | - Lucian Ilie
- Department of Computer Science, The University of Western Ontario, London, N6A 5B7, Ontario, Canada.
| |
Collapse
|
15
|
Bandyopadhyay S, Mallick K. A New Feature Vector Based on Gene Ontology Terms for Protein-Protein Interaction Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:762-770. [PMID: 28113911 DOI: 10.1109/tcbb.2016.2555304] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Protein-protein interaction (PPI) plays a key role in understanding cellular mechanisms in different organisms. Many supervised classifiers like Random Forest (RF) and Support Vector Machine (SVM) have been used for intra or inter-species interaction prediction. For improving the prediction performance, in this paper we propose a novel set of features to represent a protein pair using their annotated Gene Ontology (GO) terms, including their ancestors. In our approach, a protein pair is treated as a document (bag of words), where the terms annotating the two proteins represent the words. Feature value of each word is calculated using information content of the corresponding term multiplied by a coefficient, which represents the weight of that term inside a document (i.e., a protein pair). We have tested the performance of the classifier using the proposed feature on different well known data sets of different species like S. cerevisiae, H. Sapiens, E. Coli, and D. melanogaster. We compare it with the other GO based feature representation technique, and demonstrate its competitive performance.
Collapse
|
16
|
Schoenrock A, Burnside D, Moteshareie H, Pitre S, Hooshyar M, Green JR, Golshani A, Dehne F, Wong A. Evolution of protein-protein interaction networks in yeast. PLoS One 2017; 12:e0171920. [PMID: 28248977 PMCID: PMC5382968 DOI: 10.1371/journal.pone.0171920] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Accepted: 01/28/2017] [Indexed: 01/04/2023] Open
Abstract
Interest in the evolution of protein-protein and genetic interaction networks has been rising in recent years, but the lack of large-scale high quality comparative datasets has acted as a barrier. Here, we carried out a comparative analysis of computationally predicted protein-protein interaction (PPI) networks from five closely related yeast species. We used the Protein-protein Interaction Prediction Engine (PIPE), which uses a database of known interactions to make sequence-based PPI predictions, to generate high quality predicted interactomes. Simulated proteomes and corresponding PPI networks were used to provide null expectations for the extent and nature of PPI network evolution. We found strong evidence for conservation of PPIs, with lower than expected levels of change in PPIs for about a quarter of the proteome. Furthermore, we found that changes in predicted PPI networks are poorly predicted by sequence divergence. Our analyses identified a number of functional classes experiencing fewer PPI changes than expected, suggestive of purifying selection on PPIs. Our results demonstrate the added benefit of considering predicted PPI networks when studying the evolution of closely related organisms.
Collapse
Affiliation(s)
| | | | | | - Sylvain Pitre
- School of Computer Science, Carleton University, Ottawa, Canada
| | | | - James R. Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada
| | | | - Frank Dehne
- School of Computer Science, Carleton University, Ottawa, Canada
| | - Alex Wong
- Department of Biology, Carleton University, Ottawa, Canada
| |
Collapse
|
17
|
Computational Approaches for Predicting Binding Partners, Interface Residues, and Binding Affinity of Protein-Protein Complexes. Methods Mol Biol 2017; 1484:237-253. [PMID: 27787830 DOI: 10.1007/978-1-4939-6406-2_16] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Studying protein-protein interactions leads to a better understanding of the underlying principles of several biological pathways. Cost and labor-intensive experimental techniques suggest the need for computational methods to complement them. Several such state-of-the-art methods have been reported for analyzing diverse aspects such as predicting binding partners, interface residues, and binding affinity for protein-protein complexes with reliable performance. However, there are specific drawbacks for different methods that indicate the need for their improvement. This review highlights various available computational algorithms for analyzing diverse aspects of protein-protein interactions and endorses the necessity for developing new robust methods for gaining deep insights about protein-protein interactions.
Collapse
|
18
|
Sze-To A, Fung S, Lee ESA, Wong AK. Prediction of Protein–Protein Interaction via co-occurring Aligned Pattern Clusters. Methods 2016; 110:26-34. [DOI: 10.1016/j.ymeth.2016.07.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2016] [Revised: 06/25/2016] [Accepted: 07/26/2016] [Indexed: 10/21/2022] Open
|
19
|
Ding Y, Tang J, Guo F. Predicting protein-protein interactions via multivariate mutual information of protein sequences. BMC Bioinformatics 2016; 17:398. [PMID: 27677692 PMCID: PMC5039908 DOI: 10.1186/s12859-016-1253-9] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 09/08/2016] [Indexed: 11/10/2022] Open
Abstract
Background Protein-protein interactions (PPIs) are central to a lot of biological processes. Many algorithms and methods have been developed to predict PPIs and protein interaction networks. However, the application of most existing methods is limited since they are difficult to compute and rely on a large number of homologous proteins and interaction marks of protein partners. In this paper, we propose a novel sequence-based approach with multivariate mutual information (MMI) of protein feature representation, for predicting PPIs via Random Forest (RF). Methods Our method constructs a 638-dimentional vector to represent each pair of proteins. First, we cluster twenty standard amino acids into seven function groups and transform protein sequences into encoding sequences. Then, we use a novel multivariate mutual information feature representation scheme, combined with normalized Moreau-Broto Autocorrelation, to extract features from protein sequence information. Finally, we feed the feature vectors into a Random Forest model to distinguish interaction pairs from non-interaction pairs. Results To evaluate the performance of our new method, we conduct several comprehensive tests for predicting PPIs. Experiments show that our method achieves better results than other outstanding methods for sequence-based PPIs prediction. Our method is applied to the S.cerevisiae PPIs dataset, and achieves 95.01 % accuracy and 92.67 % sensitivity repectively. For the H.pylori PPIs dataset, our method achieves 87.59 % accuracy and 86.81 % sensitivity respectively. In addition, we test our method on other three important PPIs networks: the one-core network, the multiple-core network, and the crossover network. Conclusions Compared to the Conjoint Triad method, accuracies of our method are increased by 6.25,2.06 and 18.75 %, respectively. Our proposed method is a useful tool for future proteomics studies.
Collapse
Affiliation(s)
- Yijie Ding
- School of Computer Science and Technology, Tianjin University, No.135, Yaguan Road, Tianjin Haihe Education Park, Tianjin, People's Republic of China
| | - Jijun Tang
- School of Computer Science and Technology, Tianjin University, No.135, Yaguan Road, Tianjin Haihe Education Park, Tianjin, People's Republic of China.,Department of Computer Science and Engineering, University of South Carolina, Columbia, USA
| | - Fei Guo
- School of Computer Science and Technology, Tianjin University, No.135, Yaguan Road, Tianjin Haihe Education Park, Tianjin, People's Republic of China.
| |
Collapse
|
20
|
Perovic V, Sumonja N, Gemovic B, Toska E, Roberts SG, Veljkovic N. TRI_tool: a web-tool for prediction of protein-protein interactions in human transcriptional regulation. Bioinformatics 2016; 33:289-291. [PMID: 27605104 DOI: 10.1093/bioinformatics/btw590] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Revised: 08/26/2016] [Accepted: 09/04/2016] [Indexed: 12/26/2022] Open
Abstract
The TRI_tool, a sequence-based web tool for prediction of protein interactions in the human transcriptional regulation, is intended for biomedical investigators who work on understanding the regulation of gene expression. It has an improved predictive performance due to the training on updated, human specific, experimentally validated datasets. The TRI_tool is designed to test up to 100 potential interactions with no time delay and to report both probabilities and binarized predictions. AVAILABILITY AND IMPLEMENTATION http://www.vin.bg.ac.rs/180/tools/tfpred.php CONTACT: vladaper@vinca.rs; nevenav@vinca.rsSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vladimir Perovic
- Centre for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade 11001, Serbia
| | - Neven Sumonja
- Centre for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade 11001, Serbia
| | - Branislava Gemovic
- Centre for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade 11001, Serbia
| | - Eneda Toska
- Department of Biological Sciences, University at Buffalo, Buffalo, NY 14260, USA
| | - Stefan G Roberts
- Department of Biological Sciences, University at Buffalo, Buffalo, NY 14260, USA
| | - Nevena Veljkovic
- Centre for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade 11001, Serbia
| |
Collapse
|
21
|
Hamp T, Rost B. More challenges for machine-learning protein interactions. ACTA ACUST UNITED AC 2015; 31:1521-5. [PMID: 25586513 DOI: 10.1093/bioinformatics/btu857] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Accepted: 12/23/2014] [Indexed: 01/21/2023]
Abstract
MOTIVATION Machine learning may be the most popular computational tool in molecular biology. Providing sustained performance estimates is challenging. The standard cross-validation protocols usually fail in biology. Park and Marcotte found that even refined protocols fail for protein-protein interactions (PPIs). RESULTS Here, we sketch additional problems for the prediction of PPIs from sequence alone. First, it not only matters whether proteins A or B of a target interaction A-B are similar to proteins of training interactions (positives), but also whether A or B are similar to proteins of non-interactions (negatives). Second, training on multiple interaction partners per protein did not improve performance for new proteins (not used to train). In contrary, a strictly non-redundant training that ignored good data slightly improved the prediction of difficult cases. Third, which prediction method appears to be best crucially depends on the sequence similarity between the test and the training set, how many true interactions should be found and the expected ratio of negatives to positives. The correct assessment of performance is the most complicated task in the development of prediction methods. Our analyses suggest that PPIs square the challenge for this task.
Collapse
Affiliation(s)
- Tobias Hamp
- Department of Informatics, Bioinformatics and Computational Biology I12, Technische Universität München, 85748 Garching/Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology I12, Technische Universität München, 85748 Garching/Munich, Germany
| |
Collapse
|
22
|
Schoenrock A, Samanfar B, Pitre S, Hooshyar M, Jin K, Phillips CA, Wang H, Phanse S, Omidi K, Gui Y, Alamgir M, Wong A, Barrenäs F, Babu M, Benson M, Langston MA, Green JR, Dehne F, Golshani A. Efficient prediction of human protein-protein interactions at a global scale. BMC Bioinformatics 2014; 15:383. [PMID: 25492630 PMCID: PMC4272565 DOI: 10.1186/s12859-014-0383-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 11/12/2014] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Our knowledge of global protein-protein interaction (PPI) networks in complex organisms such as humans is hindered by technical limitations of current methods. RESULTS On the basis of short co-occurring polypeptide regions, we developed a tool called MP-PIPE capable of predicting a global human PPI network within 3 months. With a recall of 23% at a precision of 82.1%, we predicted 172,132 putative PPIs. We demonstrate the usefulness of these predictions through a range of experiments. CONCLUSIONS The speed and accuracy associated with MP-PIPE can make this a potential tool to study individual human PPI networks (from genomic sequences alone) for personalized medicine.
Collapse
Affiliation(s)
| | | | - Sylvain Pitre
- School of Computer Science, Carleton University, Ottawa, Canada.
| | | | - Ke Jin
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada.
| | - Charles A Phillips
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee, USA.
| | - Hui Wang
- Department of Pediatrics, Gothenburg University, Gothenburg, Sweden. .,The Centre for Individualized Medication, Linköping University, Linköping, Sweden.
| | - Sadhna Phanse
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada.
| | - Katayoun Omidi
- Department of Biology, Carleton University, Ottawa, Canada.
| | - Yuan Gui
- Department of Biology, Carleton University, Ottawa, Canada.
| | - Md Alamgir
- Department of Biology, Carleton University, Ottawa, Canada.
| | - Alex Wong
- Department of Biology, Carleton University, Ottawa, Canada.
| | - Fredrik Barrenäs
- Department of Pediatrics, Gothenburg University, Gothenburg, Sweden. .,The Centre for Individualized Medication, Linköping University, Linköping, Sweden.
| | - Mohan Babu
- Department of Biochemistry, Research and Innovation Centre, University of Regina, Regina, Saskatchewan, Canada.
| | - Mikael Benson
- Department of Pediatrics, Gothenburg University, Gothenburg, Sweden. .,The Centre for Individualized Medication, Linköping University, Linköping, Sweden.
| | - Michael A Langston
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee, USA.
| | - James R Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada.
| | - Frank Dehne
- School of Computer Science, Carleton University, Ottawa, Canada.
| | | |
Collapse
|
23
|
Murakami Y, Mizuguchi K. Homology-based prediction of interactions between proteins using Averaged One-Dependence Estimators. BMC Bioinformatics 2014; 15:213. [PMID: 24953126 PMCID: PMC4229973 DOI: 10.1186/1471-2105-15-213] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Accepted: 06/17/2014] [Indexed: 02/02/2023] Open
Abstract
Background Identification of protein-protein interactions (PPIs) is essential for a better understanding of biological processes, pathways and functions. However, experimental identification of the complete set of PPIs in a cell/organism (“an interactome”) is still a difficult task. To circumvent limitations of current high-throughput experimental techniques, it is necessary to develop high-performance computational methods for predicting PPIs. Results In this article, we propose a new computational method to predict interaction between a given pair of protein sequences using features derived from known homologous PPIs. The proposed method is capable of predicting interaction between two proteins (of unknown structure) using Averaged One-Dependence Estimators (AODE) and three features calculated for the protein pair: (a) sequence similarities to a known interacting protein pair (FSeq), (b) statistical propensities of domain pairs observed in interacting proteins (FDom) and (c) a sum of edge weights along the shortest path between homologous proteins in a PPI network (FNet). Feature vectors were defined to lie in a half-space of the symmetrical high-dimensional feature space to make them independent of the protein order. The predictability of the method was assessed by a 10-fold cross validation on a recently created human PPI dataset with randomly sampled negative data, and the best model achieved an Area Under the Curve of 0.79 (pAUC0.5% = 0.16). In addition, the AODE trained on all three features (named PSOPIA) showed better prediction performance on a separate independent data set than a recently reported homology-based method. Conclusions Our results suggest that FNet, a feature representing proximity in a known PPI network between two proteins that are homologous to a target protein pair, contributes to the prediction of whether the target proteins interact or not. PSOPIA will help identify novel PPIs and estimate complete PPI networks. The method proposed in this article is freely available on the web at http://mizuguchilab.org/PSOPIA.
Collapse
Affiliation(s)
- Yoichi Murakami
- Bioinformatics Project, National Institute of Biomedical Innovation, 7-6-8 Saito-Asagi, Ibaraki, Osaka 567-0085, Japan.
| | | |
Collapse
|
24
|
Omidi K, Hooshyar M, Jessulat M, Samanfar B, Sanders M, Burnside D, Pitre S, Schoenrock A, Xu J, Babu M, Golshani A. Phosphatase complex Pph3/Psy2 is involved in regulation of efficient non-homologous end-joining pathway in the yeast Saccharomyces cerevisiae. PLoS One 2014; 9:e87248. [PMID: 24498054 PMCID: PMC3909046 DOI: 10.1371/journal.pone.0087248] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2013] [Accepted: 12/20/2013] [Indexed: 11/19/2022] Open
Abstract
One of the main mechanisms for double stranded DNA break (DSB) repair is through the non-homologous end-joining (NHEJ) pathway. Using plasmid and chromosomal repair assays, we showed that deletion mutant strains for interacting proteins Pph3p and Psy2p had reduced efficiencies in NHEJ. We further observed that this activity of Pph3p and Psy2p appeared linked to cell cycle Rad53p and Chk1p checkpoint proteins. Pph3/Psy2 is a phosphatase complex, which regulates recovery from the Rad53p DNA damage checkpoint. Overexpression of Chk1p checkpoint protein in a parallel pathway to Rad53p compensated for the deletion of PPH3 or PSY2 in a chromosomal repair assay. Double mutant strains Δpph3/Δchk1 and Δpsy2/Δchk1 showed additional reductions in the efficiency of plasmid repair, compared to both single deletions which is in agreement with the activity of Pph3p and Psy2p in a parallel pathway to Chk1p. Genetic interaction analyses also supported a role for Pph3p and Psy2p in DNA damage repair, the NHEJ pathway, as well as cell cycle progression. Collectively, we report that the activity of Pph3p and Psy2p further connects NHEJ repair to cell cycle progression.
Collapse
Affiliation(s)
- Katayoun Omidi
- Department of Biology, Carleton University, Ottawa, Ontario, Canada
- Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada
| | - Mohsen Hooshyar
- Department of Biology, Carleton University, Ottawa, Ontario, Canada
- Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada
| | - Matthew Jessulat
- Department of Biology, Carleton University, Ottawa, Ontario, Canada
- Department of Biochemistry, Research and Innovation Centre, University of Regina, Regina, Saskatchewan, Canada
| | - Bahram Samanfar
- Department of Biology, Carleton University, Ottawa, Ontario, Canada
- Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada
| | - Megan Sanders
- Department of Biology, Carleton University, Ottawa, Ontario, Canada
- Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada
| | - Daniel Burnside
- Department of Biology, Carleton University, Ottawa, Ontario, Canada
- Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada
| | - Sylvain Pitre
- Department of Computer Science, Carleton University, Ottawa, Ontario, Canada
| | - Andrew Schoenrock
- Department of Computer Science, Carleton University, Ottawa, Ontario, Canada
| | - Jianhua Xu
- College of Pharmaceutical Sciences, Zhejian University, Hangzhou, Zhejiang, China
| | - Mohan Babu
- Department of Biochemistry, Research and Innovation Centre, University of Regina, Regina, Saskatchewan, Canada
| | - Ashkan Golshani
- Department of Biology, Carleton University, Ottawa, Ontario, Canada
- Ottawa Institute of Systems Biology, Carleton University, Ottawa, Ontario, Canada
| |
Collapse
|
25
|
Fan CY, Bai YH, Huang CY, Yao TJ, Chiang WH, Chang DTH. PRASA: an integrated web server that analyzes protein interaction types. Gene 2013; 518:78-83. [PMID: 23276706 DOI: 10.1016/j.gene.2012.11.083] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Accepted: 11/27/2012] [Indexed: 11/16/2022]
Abstract
This work presents the Protein Association Analyzer (PRASA) (http://zoro.ee.ncku.edu.tw/prasa/) that predicts protein interactions as well as interaction types. Protein interactions are essential to most biological functions. The existence of diverse interaction types, such as physically contacted or functionally related interactions, makes protein interactions complex. Different interaction types are distinct and should not be confused. However, most existing tools focus on a specific interaction type or mix different interaction types. This work collected 7234058 associations with experimentally verified interaction types from five databases and compiled individual probabilistic models for different interaction types. The PRASA result page shows predicted associations and their related references by interaction type. Experimental results demonstrate the performance difference when distinguishing between different interaction types. The PRASA provides a centralized and organized platform for easy browsing, downloading and comparing of interaction types, which helps reveal insights into the complex roles that proteins play in organisms.
Collapse
Affiliation(s)
- Chen-Yu Fan
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | | | | | | | | | | |
Collapse
|
26
|
Pancaldi V, Saraç ÖS, Rallis C, McLean JR, Převorovský M, Gould K, Beyer A, Bähler J. Predicting the fission yeast protein interaction network. G3 (BETHESDA, MD.) 2012; 2:453-67. [PMID: 22540037 PMCID: PMC3337474 DOI: 10.1534/g3.111.001560] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 01/31/2012] [Indexed: 12/03/2022]
Abstract
A systems-level understanding of biological processes and information flow requires the mapping of cellular component interactions, among which protein-protein interactions are particularly important. Fission yeast (Schizosaccharomyces pombe) is a valuable model organism for which no systematic protein-interaction data are available. We exploited gene and protein properties, global genome regulation datasets, and conservation of interactions between budding and fission yeast to predict fission yeast protein interactions in silico. We have extensively tested our method in three ways: first, by predicting with 70-80% accuracy a selected high-confidence test set; second, by recapitulating interactions between members of the well-characterized SAGA co-activator complex; and third, by verifying predicted interactions of the Cbf11 transcription factor using mass spectrometry of TAP-purified protein complexes. Given the importance of the pathway in cell physiology and human disease, we explore the predicted sub-networks centered on the Tor1/2 kinases. Moreover, we predict the histidine kinases Mak1/2/3 to be vital hubs in the fission yeast stress response network, and we suggest interactors of argonaute 1, the principal component of the siRNA-mediated gene silencing pathway, lost in budding yeast but preserved in S. pombe. Of the new high-quality interactions that were discovered after we started this work, 73% were found in our predictions. Even though any predicted interactome is imperfect, the protein network presented here can provide a valuable basis to explore biological processes and to guide wet-lab experiments in fission yeast and beyond. Our predicted protein interactions are freely available through PInt, an online resource on our website (www.bahlerlab.info/PInt).
Collapse
Affiliation(s)
- Vera Pancaldi
- Department of Genetics, Evolution, and Environment and
- UCL Cancer Institute, University College London, London WC1E 6BT, United Kingdom
| | - Ömer S. Saraç
- Cellular Networks and Systems Biology, Biotechnology Center, Dresden University of Technology (TU Dresden), Dresden 01307, Germany, and
| | - Charalampos Rallis
- Department of Genetics, Evolution, and Environment and
- UCL Cancer Institute, University College London, London WC1E 6BT, United Kingdom
| | - Janel R. McLean
- Howard Hughes Medical Institute
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, Tennessee 37232
| | - Martin Převorovský
- Department of Genetics, Evolution, and Environment and
- UCL Cancer Institute, University College London, London WC1E 6BT, United Kingdom
| | - Kathleen Gould
- Howard Hughes Medical Institute
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, Tennessee 37232
| | - Andreas Beyer
- Cellular Networks and Systems Biology, Biotechnology Center, Dresden University of Technology (TU Dresden), Dresden 01307, Germany, and
| | - Jürg Bähler
- Department of Genetics, Evolution, and Environment and
- UCL Cancer Institute, University College London, London WC1E 6BT, United Kingdom
| |
Collapse
|
27
|
Pitre S, Hooshyar M, Schoenrock A, Samanfar B, Jessulat M, Green JR, Dehne F, Golshani A. Short Co-occurring Polypeptide Regions Can Predict Global Protein Interaction Maps. Sci Rep 2012; 2:239. [PMID: 22355752 PMCID: PMC3269044 DOI: 10.1038/srep00239] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2011] [Accepted: 12/14/2011] [Indexed: 11/16/2022] Open
Abstract
A goal of the post-genomics era has been to elucidate a detailed global map of protein-protein interactions (PPIs) within a cell. Here, we show that the presence of co-occurring short polypeptide sequences between interacting protein partners appears to be conserved across different organisms. We present an algorithm to automatically generate PPI prediction method parameters for various organisms and illustrate that global PPIs can be predicted from previously reported PPIs within the same or a different organism using protein primary sequences. The PPI prediction code is further accelerated through the use of parallel multi-core programming, which improves its usability for large scale or proteome-wide PPI prediction. We predict and analyze hundreds of novel human PPIs, experimentally confirm protein functions and importantly predict the first genome-wide PPI maps for S. pombe (∼9,000 PPIs) and C. elegans (∼37,500 PPIs).
Collapse
|
28
|
Park Y, Marcotte EM. Revisiting the negative example sampling problem for predicting protein-protein interactions. Bioinformatics 2011; 27:3024-8. [PMID: 21908540 DOI: 10.1093/bioinformatics/btr514] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION A number of computational methods have been proposed that predict protein-protein interactions (PPIs) based on protein sequence features. Since the number of potential non-interacting protein pairs (negative PPIs) is very high both in absolute terms and in comparison to that of interacting protein pairs (positive PPIs), computational prediction methods rely upon subsets of negative PPIs for training and validation. Hence, the need arises for subset sampling for negative PPIs. RESULTS We clarify that there are two fundamentally different types of subset sampling for negative PPIs. One is subset sampling for cross-validated testing, where one desires unbiased subsets so that predictive performance estimated with them can be safely assumed to generalize to the population level. The other is subset sampling for training, where one desires the subsets that best train predictive algorithms, even if these subsets are biased. We show that confusion between these two fundamentally different types of subset sampling led one study recently published in Bioinformatics to the erroneous conclusion that predictive algorithms based on protein sequence features are hardly better than random in predicting PPIs. Rather, both protein sequence features and the 'hubbiness' of interacting proteins contribute to effective prediction of PPIs. We provide guidance for appropriate use of random versus balanced sampling. AVAILABILITY The datasets used for this study are available at http://www.marcottelab.org/PPINegativeDataSampling. CONTACT yungki@mail.utexas.edu; marcotte@icmb.utexas.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yungki Park
- Center for Systems and Synthetic Biology, Institute of Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas 78712, USA.
| | | |
Collapse
|
29
|
Jessulat M, Pitre S, Gui Y, Hooshyar M, Omidi K, Samanfar B, Tan LH, Alamgir M, Green J, Dehne F, Golshani A. Recent advances in protein-protein interaction prediction: experimental and computational methods. Expert Opin Drug Discov 2011; 6:921-35. [PMID: 22646215 DOI: 10.1517/17460441.2011.603722] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
INTRODUCTION Proteins within the cell act as part of complex networks, which allow pathways and processes to function. Therefore, understanding how proteins interact is a significant area of current research. AREAS COVERED This review aims to present an overview of key experimental techniques (yeast two-hybrid, tandem affinity purification and protein microarrays) used to discover protein-protein interactions (PPIs), as well as to briefly discuss certain computational methods for predicting protein interactions based on gene localization, phylogenetic information, 3D structural modeling or primary protein sequence data. Due to the large-scale applicability of primary sequence-based methods, the authors have chosen to focus on this strategy for our review. There is an emphasis on a recent algorithm called Protein Interaction Prediction Engine (PIPE) that can predict global PPIs. The readers will discover recent advances both in the practical determination of protein interaction and the strategies that are available to attempt to anticipate interactions without the time and costs of experimental work. EXPERT OPINION Global PPI maps can help understand the biology of complex diseases and facilitate the identification of novel drug target sites. This study describes different techniques used for PPI prediction that we believe will significantly impact the development of the field in a new future. We expect to see a growing number of similar techniques capable of large-scale PPI predictions.
Collapse
Affiliation(s)
- Matthew Jessulat
- Carleton University , Department of Biology , 209 Nesbitt Building, 1125 Colonel By Drive, Ottawa, Ontario K1S 5B6 , Canada
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Amos-Binks A, Patulea C, Pitre S, Schoenrock A, Gui Y, Green JR, Golshani A, Dehne F. Binding site prediction for protein-protein interactions and novel motif discovery using re-occurring polypeptide sequences. BMC Bioinformatics 2011; 12:225. [PMID: 21635751 PMCID: PMC3120708 DOI: 10.1186/1471-2105-12-225] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2011] [Accepted: 06/02/2011] [Indexed: 11/25/2022] Open
Abstract
Background While there are many methods for predicting protein-protein interaction, very few can determine the specific site of interaction on each protein. Characterization of the specific sequence regions mediating interaction (binding sites) is crucial for an understanding of cellular pathways. Experimental methods often report false binding sites due to experimental limitations, while computational methods tend to require data which is not available at the proteome-scale. Here we present PIPE-Sites, a novel method of protein specific binding site prediction based on pairs of re-occurring polypeptide sequences, which have been previously shown to accurately predict protein-protein interactions. PIPE-Sites operates at high specificity and requires only the sequences of query proteins and a database of known binary interactions with no binding site data, making it applicable to binding site prediction at the proteome-scale. Results PIPE-Sites was evaluated using a dataset of 265 yeast and 423 human interacting proteins pairs with experimentally-determined binding sites. We found that PIPE-Sites predictions were closer to the confirmed binding site than those of two existing binding site prediction methods based on domain-domain interactions, when applied to the same dataset. Finally, we applied PIPE-Sites to two datasets of 2347 yeast and 14,438 human novel interacting protein pairs predicted to interact with high confidence. An analysis of the predicted interaction sites revealed a number of protein subsequences which are highly re-occurring in binding sites and which may represent novel binding motifs. Conclusions PIPE-Sites is an accurate method for predicting protein binding sites and is applicable to the proteome-scale. Thus, PIPE-Sites could be useful for exhaustive analysis of protein binding patterns in whole proteomes as well as discovery of novel binding motifs. PIPE-Sites is available online at http://pipe-sites.cgmlab.org/.
Collapse
Affiliation(s)
- Adam Amos-Binks
- School of Computer Science, Carleton University, Ottawa, ON K1S5B6, Canada
| | | | | | | | | | | | | | | |
Collapse
|
31
|
Zhang YN, Pan XY, Huang Y, Shen HB. Adaptive compressive learning for prediction of protein-protein interactions from primary sequence. J Theor Biol 2011; 283:44-52. [PMID: 21635901 DOI: 10.1016/j.jtbi.2011.05.023] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2010] [Revised: 04/20/2011] [Accepted: 05/16/2011] [Indexed: 12/11/2022]
Abstract
Protein-protein interactions (PPIs) play an important role in biological processes. Although much effort has been devoted to the identification of novel PPIs by integrating experimental biological knowledge, there are still many difficulties because of lacking enough protein structural and functional information. It is highly desired to develop methods based only on amino acid sequences for predicting PPIs. However, sequence-based predictors are often struggling with the high-dimensionality causing over-fitting and high computational complexity problems, as well as the redundancy of sequential feature vectors. In this paper, a novel computational approach based on compressed sensing theory is proposed to predict yeast Saccharomyces cerevisiae PPIs from primary sequence and has achieved promising results. The key advantage of the proposed compressed sensing algorithm is that it can compress the original high-dimensional protein sequential feature vector into a much lower but more condensed space taking the sparsity property of the original signal into account. What makes compressed sensing much more attractive in protein sequence analysis is its compressed signal can be reconstructed from far fewer measurements than what is usually considered necessary in traditional Nyquist sampling theory. Experimental results demonstrate that proposed compressed sensing method is powerful for analyzing noisy biological data and reducing redundancy in feature vectors. The proposed method represents a new strategy of dealing with high-dimensional protein discrete model and has great potentiality to be extended to deal with many other complicated biological systems.
Collapse
Affiliation(s)
- Ya-Nan Zhang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | | | | | | |
Collapse
|
32
|
Yu J, Guo M, Needham CJ, Huang Y, Cai L, Westhead DR. Simple sequence-based kernels do not predict protein-protein interactions. Bioinformatics 2010; 26:2610-4. [PMID: 20801913 DOI: 10.1093/bioinformatics/btq483] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Jiantao Yu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | | | | | | | | | | |
Collapse
|
33
|
Alamgir M, Erukova V, Jessulat M, Azizi A, Golshani A. Chemical-genetic profile analysis of five inhibitory compounds in yeast. BMC CHEMICAL BIOLOGY 2010; 10:6. [PMID: 20691087 PMCID: PMC2925817 DOI: 10.1186/1472-6769-10-6] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2010] [Accepted: 08/06/2010] [Indexed: 11/10/2022]
Abstract
Background Chemical-genetic profiling of inhibitory compounds can lead to identification of their modes of action. These profiles can help elucidate the complex interactions between small bioactive compounds and the cell machinery, and explain putative gene function(s). Results Colony size reduction was used to investigate the chemical-genetic profile of cycloheximide, 3-amino-1,2,4-triazole, paromomycin, streptomycin and neomycin in the yeast Saccharomyces cerevisiae. These compounds target the process of protein biosynthesis. More than 70,000 strains were analyzed from the array of gene deletion mutant yeast strains. As expected, the overall profiles of the tested compounds were similar, with deletions for genes involved in protein biosynthesis being the major category followed by metabolism. This implies that novel genes involved in protein biosynthesis could be identified from these profiles. Further investigations were carried out to assess the activity of three profiled genes in the process of protein biosynthesis using relative fitness of double mutants and other genetic assays. Conclusion Chemical-genetic profiles provide insight into the molecular mechanism(s) of the examined compounds by elucidating their potential primary and secondary cellular target sites. Our follow-up investigations into the activity of three profiled genes in the process of protein biosynthesis provided further evidence concerning the usefulness of chemical-genetic analyses for annotating gene functions. We termed these genes TAE2, TAE3 and TAE4 for translation associated elements 2-4.
Collapse
Affiliation(s)
- Md Alamgir
- Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, K1 S 5B6, ON, Canada.
| | | | | | | | | |
Collapse
|
34
|
Innovative bioinformatic approaches for developing peptide-based vaccines against hypervariable viruses. Immunol Cell Biol 2010; 89:81-9. [PMID: 20458336 DOI: 10.1038/icb.2010.65] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The application of the fields of pharmacogenomics and pharmacogenetics to vaccine design has been recently labeled 'vaccinomics'. This newly named area of vaccine research, heavily intertwined with bioinformatics, seems to be leading the charge in developing novel vaccines for currently unmet medical needs against hypervariable viruses such as human immunodeficiency virus (HIV), hepatitis C and emerging avian and swine influenza. Some of the more recent bioinformatic approaches in the area of vaccine research include the use of epitope determination and prediction algorithms for exploring the use of peptide epitopes as vaccine immunogens. This paper briefly discusses and explores some current uses of bioinformatics in vaccine design toward the pursuit of peptide vaccines for hypervariable viruses. The various informatics and vaccine design strategies attempted by other groups toward hypervariable viruses will also be briefly examined, along with the strategy used by our group in the design and synthesis of peptide immunogens for candidate HIV and influenza vaccines.
Collapse
|
35
|
Zhang KX, Ouellette BFF. Pandora, a pathway and network discovery approach based on common biological evidence. ACTA ACUST UNITED AC 2009; 26:529-35. [PMID: 20031970 PMCID: PMC2820679 DOI: 10.1093/bioinformatics/btp701] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Motivation: Many biological phenomena involve extensive interactions between many of the biological pathways present in cells. However, extraction of all the inherent biological pathways remains a major challenge in systems biology. With the advent of high-throughput functional genomic techniques, it is now possible to infer biological pathways and pathway organization in a systematic way by integrating disparate biological information. Results: Here, we propose a novel integrated approach that uses network topology to predict biological pathways. We integrated four types of biological evidence (protein–protein interaction, genetic interaction, domain–domain interaction and semantic similarity of Gene Ontology terms) to generate a functionally associated network. This network was then used to develop a new pathway finding algorithm to predict biological pathways in yeast. Our approach discovered 195 biological pathways and 31 functionally redundant pathway pairs in yeast. By comparing our identified pathways to three public pathway databases (KEGG, BioCyc and Reactome), we observed that our approach achieves a maximum positive predictive value of 12.8% and improves on other predictive approaches. This study allows us to reconstruct biological pathways and delineates cellular machinery in a systematic view. Availability: The method has been implemented in Perl and is available for downloading from http://www.oicr.on.ca/research/ouellette/pandora. It is distributed under the terms of GPL (http://opensource.org/licenses/gpl-2.0.php) Contact:francis@oicr.on.ca Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kelvin Xi Zhang
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| | | |
Collapse
|
36
|
Park Y. Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences. BMC Bioinformatics 2009; 10:419. [PMID: 20003442 PMCID: PMC2803199 DOI: 10.1186/1471-2105-10-419] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2009] [Accepted: 12/14/2009] [Indexed: 11/10/2022] Open
Abstract
Background Protein-protein interactions underlie many important biological processes. Computational prediction methods can nicely complement experimental approaches for identifying protein-protein interactions. Recently, a unique category of sequence-based prediction methods has been put forward - unique in the sense that it does not require homologous protein sequences. This enables it to be universally applicable to all protein sequences unlike many of previous sequence-based prediction methods. If effective as claimed, these new sequence-based, universally applicable prediction methods would have far-reaching utilities in many areas of biology research. Results Upon close survey, I realized that many of these new methods were ill-tested. In addition, newer methods were often published without performance comparison with previous ones. Thus, it is not clear how good they are and whether there are significant performance differences among them. In this study, I have implemented and thoroughly tested 4 different methods on large-scale, non-redundant data sets. It reveals several important points. First, significant performance differences are noted among different methods. Second, data sets typically used for training prediction methods appear significantly biased, limiting the general applicability of prediction methods trained with them. Third, there is still ample room for further developments. In addition, my analysis illustrates the importance of complementary performance measures coupled with right-sized data sets for meaningful benchmark tests. Conclusions The current study reveals the potentials and limits of the new category of sequence-based protein-protein interaction prediction methods, which in turn provides a firm ground for future endeavours in this important area of contemporary bioinformatics.
Collapse
Affiliation(s)
- Yungki Park
- Institute of Cellular and Molecular Biology (MBB 3 210B), Center for Systems and Synthetic Biology, University of Texas at Austin, 2500 Speedway, Austin, Texas, USA.
| |
Collapse
|
37
|
Thomas J, Ramakrishnan N, Bailey-Kellogg C. Graphical models of protein-protein interaction specificity from correlated mutations and interaction data. Proteins 2009; 76:911-29. [DOI: 10.1002/prot.22398] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
38
|
Current awareness on yeast. Yeast 2009. [DOI: 10.1002/yea.1618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
39
|
Estruch F, Peiró-Chova L, Gómez-Navarro N, Durbán J, Hodge C, Del Olmo M, Cole CN. A genetic screen in Saccharomyces cerevisiae identifies new genes that interact with mex67-5, a temperature-sensitive allele of the gene encoding the mRNA export receptor. Mol Genet Genomics 2008; 281:125-34. [PMID: 19034519 DOI: 10.1007/s00438-008-0402-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2008] [Accepted: 10/29/2008] [Indexed: 10/21/2022]
Abstract
The Mex67p protein, together with Mtr2p, functions as the mRNA export receptor in Saccharomyces cerevisiae by interacting with both mRNA and nuclear pore complexes. To identify genes that interact functionally with MEX67, we used transposon insertion to search for mutations that suppressed the temperature-sensitive mex67-5 allele. Four suppressors are described here. The screen revealed that mutant Mex67-5p, but not wild-type Mex67p, is a target of the nuclear protein quality control mediated by San1p, a ubiquitin-protein ligase that participates in degradation of aberrant chromatin-associated proteins. Our finding that overexpression of the SPT6 gene alleviates the growth defects of the mex67-5 strain, together with the impairment of poly(A)(+) RNA export caused by depletion of Spt6p or the related protein Iws1p/Spn1p, supports the mechanism proposed in mammalian cells for Spt6-mediated co-transcriptional loading of mRNA export factors during transcription elongation. Finally, our results also uncovered genetic connections between Mex67p and the poly(A) nuclease complex and with components of chromatin boundary elements.
Collapse
Affiliation(s)
- Francisco Estruch
- Department of Biochemistry and Molecular Biology, Universitat de Valencia, c/Dr. Moliner, 50, Burjassot (Valencia), 46100, Spain.
| | | | | | | | | | | | | |
Collapse
|