1
|
Jha K, Saha S, Karmakar S. Prediction of Protein-Protein Interactions Using Vision Transformer and Language Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3215-3225. [PMID: 37027644 DOI: 10.1109/tcbb.2023.3248797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The knowledge of protein-protein interaction (PPI) helps us to understand proteins' functions, the causes and growth of several diseases, and can aid in designing new drugs. The majority of existing PPI research has relied mainly on sequence-based approaches. With the availability of multi-omics datasets (sequence, 3D structure) and advancements in deep learning techniques, it is feasible to develop a deep multi-modal framework that fuses the features learned from different sources of information to predict PPI. In this work, we propose a multi-modal approach utilizing protein sequence and 3D structure. To extract features from the 3D structure of proteins, we use a pre-trained vision transformer model that has been fine-tuned on the structural representation of proteins. The protein sequence is encoded into a feature vector using a pre-trained language model. The feature vectors extracted from the two modalities are fused and then fed to the neural network classifier to predict the protein interactions. To showcase the effectiveness of the proposed methodology, we conduct experiments on two popular PPI datasets, namely, the human dataset and the S. cerevisiae dataset. Our approach outperforms the existing methodologies to predict PPI, including multi-modal approaches. We also evaluate the contributions of each modality by designing uni-modal baselines. We perform experiments with three modalities as well, having gene ontology as the third modality.
Collapse
|
2
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:243-269. [DOI: 10.1093/bfgp/elac007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/14/2022] Open
|
3
|
Kumar S. Protein–Protein Interaction Network for the Identification of New Targets Against Novel Coronavirus. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2021:213-230. [DOI: 10.1007/7653_2020_62] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
4
|
A simple and rapid pipeline for identification of receptor-binding sites on the surface proteins of pathogens. Sci Rep 2020; 10:1163. [PMID: 31980725 PMCID: PMC6981161 DOI: 10.1038/s41598-020-58305-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 01/14/2020] [Indexed: 12/02/2022] Open
Abstract
Ligand-receptor interactions play a crucial role in the plethora of biological processes. Several methods have been established to reveal ligand-receptor interface, however, the majority of methods are time-consuming, laborious and expensive. Here we present a straightforward and simple pipeline to identify putative receptor-binding sites on the pathogen ligands. Two model ligands (bait proteins), domain III of protein E of West Nile virus and NadA of Neisseria meningitidis, were incubated with the proteins of human brain microvascular endothelial cells immobilized on nitrocellulose or PVDF membrane, the complex was trypsinized on-membrane, bound peptides of the bait proteins were recovered and detected on MALDI-TOF. Two peptides of DIII (~916 Da and ~2003 Da) and four peptides of NadA (~1453 Da, ~1810 Da, ~2051 Da and ~2433 Da) were identified as plausible receptor-binders. Further, binding of the identified peptides to the proteins of endothelial cells was corroborated using biotinylated synthetic analogues in ELISA and immunocytochemistry. Experimental pipeline presented here can be upscaled easily to map receptor-binding sites on several ligands simultaneously. The approach is rapid, cost-effective and less laborious. The proposed experimental pipeline could be a simpler alternative or complementary method to the existing techniques used to reveal amino-acids involved in the ligand-receptor interface.
Collapse
|
5
|
A Computational Framework for Predicting Direct Contacts and Substructures within Protein Complexes. Biomolecules 2019; 9:biom9110656. [PMID: 31717703 PMCID: PMC6921016 DOI: 10.3390/biom9110656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 10/20/2019] [Accepted: 10/23/2019] [Indexed: 11/17/2022] Open
Abstract
Understanding the physical arrangement of subunits within protein complexes potentially provides valuable clues about how the subunits work together and how the complexes function. The majority of recent research focuses on identifying protein complexes as a whole and seldom studies the inner structures within complexes. In this study, we propose a computational framework to predict direct contacts and substructures within protein complexes. In this framework, we first train a supervised learning model of l2-regularized logistic regression to learn the patterns of direct and indirect interactions within complexes, from where physical subunit interaction networks are predicted. Then, to infer substructures within complexes, we apply a graph clustering method (i.e., maximum modularity clustering (MMC)) and a gene ontology (GO) semantic similarity based functional clustering on partially- and fully-connected networks, respectively. Computational results show that the proposed framework achieves fairly good performance of cross validation and independent test in terms of detecting direct contacts between subunits. Functional analyses further demonstrate the rationality of partitioning the subunits into substructures via the MMC algorithm and functional clustering.
Collapse
|
6
|
Protein Complex Identification and quantitative complexome by CN-PAGE. Sci Rep 2019; 9:11523. [PMID: 31395906 PMCID: PMC6687828 DOI: 10.1038/s41598-019-47829-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 07/24/2019] [Indexed: 02/07/2023] Open
Abstract
The majority of cellular processes are carried out by protein complexes. Various size fractionation methods have previously been combined with mass spectrometry to identify protein complexes. However, most of these approaches lack the quantitative information which is required to understand how changes of protein complex abundance and composition affect metabolic fluxes. In this paper we present a proof of concept approach to quantitatively study the complexome in the model plant Arabidopsis thaliana at the end of the day (ED) and the end of the night (EN). We show that size-fractionation of native protein complexes by Clear-Native-PAGE (CN-PAGE), coupled with mass spectrometry can be used to establish abundance profiles along the molecular weight gradient. Furthermore, by deconvoluting complex protein abundance profiles, we were able to drastically improve the clustering of protein profiles. To identify putative interaction partners, and ultimately protein complexes, our approach calculates the Euclidian distance between protein profile pairs. Acceptable threshold values are based on a cut-off that is optimized by a receiver-operator characteristic (ROC) curve analysis. Our approach shows low technical variation and can easily be adapted to study in the complexome in any biological system.
Collapse
|
7
|
Ding Z, Kihara D. Computational identification of protein-protein interactions in model plant proteomes. Sci Rep 2019; 9:8740. [PMID: 31217453 PMCID: PMC6584649 DOI: 10.1038/s41598-019-45072-8] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 05/30/2019] [Indexed: 12/12/2022] Open
Abstract
Protein-protein interactions (PPIs) play essential roles in many biological processes. A PPI network provides crucial information on how biological pathways are structured and coordinated from individual protein functions. In the past two decades, large-scale PPI networks of a handful of organisms were determined by experimental techniques. However, these experimental methods are time-consuming, expensive, and are not easy to perform on new target organisms. Large-scale PPI data is particularly sparse in plant organisms. Here, we developed a computational approach for detecting PPIs trained and tested on known PPIs of Arabidopsis thaliana and applied to three plants, Arabidopsis thaliana, Glycine max (soybean), and Zea mays (maize) to discover new PPIs on a genome-scale. Our method considers a variety of features including protein sequences, gene co-expression, functional association, and phylogenetic profiles. This is the first work where a PPI prediction method was developed for is the first PPI prediction method applied on benchmark datasets of Arabidopsis. The method showed a high prediction accuracy of over 90% and very high precision of close to 1.0. We predicted 50,220 PPIs in Arabidopsis thaliana, 13,175,414 PPIs in corn, and 13,527,834 PPIs in soybean. Newly predicted PPIs were classified into three confidence levels according to the availability of existing supporting evidence and discussed. Predicted PPIs in the three plant genomes are made available for future reference.
Collapse
Affiliation(s)
- Ziyun Ding
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA.
- Department of Pediatrics, University of Cincinnati, Cincinnati, OH, 45229, USA.
| |
Collapse
|
8
|
Ding Z, Kihara D. Computational Methods for Predicting Protein-Protein Interactions Using Various Protein Features. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2018; 93:e62. [PMID: 29927082 PMCID: PMC6097941 DOI: 10.1002/cpps.62] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Understanding protein-protein interactions (PPIs) in a cell is essential for learning protein functions, pathways, and mechanism of diseases. PPIs are also important targets for developing drugs. Experimental methods, both small-scale and large-scale, have identified PPIs in several model organisms. However, results cover only a part of PPIs of organisms; moreover, there are many organisms whose PPIs have not yet been investigated. To complement experimental methods, many computational methods have been developed that predict PPIs from various characteristics of proteins. Here we provide an overview of literature reports to classify computational PPI prediction methods that consider different features of proteins, including protein sequence, genomes, protein structure, function, PPI network topology, and those which integrate multiple methods. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Ziyun Ding
- Department of Biological Science, Purdue University, West Lafayette, IN, 47907 USA
| | - Daisuke Kihara
- Department of Biological Science, Purdue University, West Lafayette, IN, 47907 USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907 USA
- Corresponding author: DK; , Phone: 1-765-496-2284 (DK)
| |
Collapse
|
9
|
Mei S, Flemington EK, Zhang K. A computational framework for distinguishing direct versus indirect interactions in human functional protein-protein interaction networks. Integr Biol (Camb) 2018; 9:595-606. [PMID: 28524201 DOI: 10.1039/c7ib00013h] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Recognition of indirect interactions is instrumental to in silico reconstruction of signaling pathways and sheds light on the exploration of unknown physical paths between two indirectly interacting genes. However, very limited computational methods have explicitly exploited the indirect interactions with experimental evidence thus far. In this work, we attempt to distinguish direct versus indirect interactions in human functional protein-protein interaction (PPI) networks via a predictive l2-regularized logistic regression model built on the experimental data. The l2-regularized logistic regression method is adopted to counteract the potential homolog noise and reduce the computational complexity on large training data. Computational results show that the proposed model demonstrates promising performance even though the training data are highly skewed. From the 304 799 PPIs that are curated in several databases, the proposed method detects 23 131 indirect interactions, most of which have been verified by the breadth-first graph search algorithm to find dozens of physical paths between the interacting partners. Pathway enrichment analysis shows that most of the physical paths can be mapped onto more than one human signaling pathway, indicating that there do exist a series of biochemical signals between the two indirectly interacting genes. The interactome-scale computational results promise to provide useful cues to the following applications: (1) exploration of unknown physical PPIs or physical paths between two indirectly interacting genes; (2) amending or extending the existing signaling pathways; (3) recognition of the physical PPIs for druggable target discovery.
Collapse
Affiliation(s)
- Suyu Mei
- Software College, Shenyang Normal University, Shenyang, 110034, China.
| | | | | |
Collapse
|
10
|
Shi H, Zhang G, Wang J, Wang Z, Liu X, Cheng L, Li W. Studying Dynamic Features in Myocardial Infarction Progression by Integrating miRNA-Transcription Factor Co-Regulatory Networks and Time-Series RNA Expression Data from Peripheral Blood Mononuclear Cells. PLoS One 2016; 11:e0158638. [PMID: 27367417 PMCID: PMC4930172 DOI: 10.1371/journal.pone.0158638] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2016] [Accepted: 06/20/2016] [Indexed: 12/22/2022] Open
Abstract
Myocardial infarction (MI) is a serious heart disease and a leading cause of mortality and morbidity worldwide. Although some molecules (genes, miRNAs and transcription factors (TFs)) associated with MI have been studied in a specific pathological context, their dynamic characteristics in gene expressions, biological functions and regulatory interactions in MI progression have not been fully elucidated to date. In the current study, we analyzed time-series RNA expression data from peripheral blood mononuclear cells. We observed that significantly differentially expressed genes were sharply up- or down-regulated in the acute phase of MI, and then changed slowly until the chronic phase. Biological functions involved at each stage of MI were identified. Additionally, dynamic miRNA–TF co-regulatory networks were constructed based on the significantly differentially expressed genes and miRNA–TF co-regulatory motifs, and the dynamic interplay of miRNAs, TFs and target genes were investigated. Finally, a new panel of candidate diagnostic biomarkers (STAT3 and ICAM1) was identified to have discriminatory capability for patients with or without MI, especially the patients with or without recurrent events. The results of the present study not only shed new light on the understanding underlying regulatory mechanisms involved in MI progression, but also contribute to the discovery of true diagnostic biomarkers for MI.
Collapse
Affiliation(s)
- Hongbo Shi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, PR China
| | - Guangde Zhang
- Department of Cardiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, 150001, PR China
| | - Jing Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, PR China
| | - Zhenzhen Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, PR China
| | - Xiaoxia Liu
- Department of Cardiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, 150001, PR China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, PR China
| | - Weimin Li
- Department of Cardiology, The First Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, 150001, PR China
- * E-mail:
| |
Collapse
|
11
|
Jiang P, Missoum S, Chen Z. Fusion of clinical and stochastic finite element data for hip fracture risk prediction. J Biomech 2015; 48:4043-4052. [PMID: 26482733 PMCID: PMC4737502 DOI: 10.1016/j.jbiomech.2015.09.044] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 08/19/2015] [Accepted: 09/27/2015] [Indexed: 11/20/2022]
Abstract
Hip fracture affects more than 250,000 people in the US and 1.6 million worldwide per year. With an aging population, the development of reliable fracture risk models is therefore of prime importance. Due to the complexity of the hip fracture phenomenon, the use of clinical data only, as it is done traditionally, might not be sufficient to ensure an accurate and robust hip fracture prediction model. In order to increase the predictive ability of the risk model, the authors propose to supplement the clinical data with computational data from finite element models. The fusion of the two types of data is performed using deterministic and stochastic computational data. In the latter case, uncertainties in loading and material properties of the femur are accounted for and propagated through the finite element model. The predictive capability of a support vector machine (SVM) risk model constructed by combining clinical and finite element data was assessed using a Women׳s Health Initiative (WHI) dataset. The dataset includes common factors such as age and BMD as well as geometric factors obtained from DXA imaging. The fusion of computational and clinical data systematically leads to an increase in predictive ability of the SVM risk model as measured by the AUC metric. It is concluded that the largest gains in AUC are obtained by the stochastic approach. This gain decreases as the dimensionality of the problem increases: a 5.3% AUC improvement was achieved for a 9 dimensional problem involving geometric factors and weight while a 1.3% increase was obtained for a 20 dimensional case including geometric and conventional factors.
Collapse
Affiliation(s)
- Peng Jiang
- Aerospace and Mechanical Engineering Department, University of Arizona, Tucson, AZ, USA
| | - Samy Missoum
- Aerospace and Mechanical Engineering Department, University of Arizona, Tucson, AZ, USA.
| | - Zhao Chen
- Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
12
|
Makris C, Theodoridis E. Computational Methods for Modeling Biological Interaction Networks. PATTERN RECOGNITION IN COMPUTATIONAL MOLECULAR BIOLOGY 2015:505-524. [DOI: 10.1002/9781119078845.ch26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
13
|
Meyer MR, Shah S, Zhang J, Rohrs H, Rao AG. Evidence for intermolecular interactions between the intracellular domains of the arabidopsis receptor-like kinase ACR4, its homologs and the Wox5 transcription factor. PLoS One 2015; 10:e0118861. [PMID: 25756623 PMCID: PMC4355418 DOI: 10.1371/journal.pone.0118861] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 01/07/2015] [Indexed: 11/23/2022] Open
Abstract
Arabidopsis CRINKLY4 (ACR4) is a receptor-like kinase (RLK) involved in the global development of the plant. The Arabidopsis genome encodes four homologs of ACR4 that contain sequence similarity and analogous architectural elements to ACR4, termed Arabidopsis CRINKLY4 Related (AtCRRs) proteins. Additionally, a signaling module has been previously proposed including a postulated peptide ligand, CLE40, the ACR4 RLK, and the WOX5 transcription factor that engage in a possible feedback mechanism controlling stem cell differentiation. However, little biochemical evidence is available to ascertain the molecular aspects of receptor heterodimerization and the role of phosphorylation in these interactions. Therefore, we have undertaken an investigation of the in vitro interactions between the intracellular domains (ICD) of ACR4, the CRRs and WOX5. We demonstrate that interaction can occur between ACR4 and all four CRRs in the unphosphorylated state. However, phosphorylation dependency is observed for the interaction between ACR4 and CRR3. Furthermore, sequence analysis of the ACR4 gene family has revealed a conserved ‘KDSAF’ motif that may be involved in protein-protein interactions among the receptor family. We demonstrate that peptides harboring this conserved motif in CRR3 and CRK1are able to bind to the ACR4 kinase domain. Our investigations also indicate that the ACR4 ICD can interact with and phosphorylate the transcription factor WOX5.
Collapse
Affiliation(s)
- Matthew R. Meyer
- Department of Medicine, Washington University School of Medicine, 660 S. Euclid Ave, St. Louis, MO 63130, United States of America
| | - Shweta Shah
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa 50011, United States of America
| | - J. Zhang
- NIH NCRR Center for Biomedical and Bio-Organic Mass Spectrometry, Dept. of Chemistry, Washington University, St. Louis, MO 63130, United States of America
| | - Henry Rohrs
- NIH NCRR Center for Biomedical and Bio-Organic Mass Spectrometry, Dept. of Chemistry, Washington University, St. Louis, MO 63130, United States of America
| | - A. Gururaj Rao
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa 50011, United States of America
- * E-mail:
| |
Collapse
|
14
|
|
15
|
Fan CY, Bai YH, Huang CY, Yao TJ, Chiang WH, Chang DTH. PRASA: an integrated web server that analyzes protein interaction types. Gene 2013; 518:78-83. [PMID: 23276706 DOI: 10.1016/j.gene.2012.11.083] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Accepted: 11/27/2012] [Indexed: 11/16/2022]
Abstract
This work presents the Protein Association Analyzer (PRASA) (http://zoro.ee.ncku.edu.tw/prasa/) that predicts protein interactions as well as interaction types. Protein interactions are essential to most biological functions. The existence of diverse interaction types, such as physically contacted or functionally related interactions, makes protein interactions complex. Different interaction types are distinct and should not be confused. However, most existing tools focus on a specific interaction type or mix different interaction types. This work collected 7234058 associations with experimentally verified interaction types from five databases and compiled individual probabilistic models for different interaction types. The PRASA result page shows predicted associations and their related references by interaction type. Experimental results demonstrate the performance difference when distinguishing between different interaction types. The PRASA provides a centralized and organized platform for easy browsing, downloading and comparing of interaction types, which helps reveal insights into the complex roles that proteins play in organisms.
Collapse
Affiliation(s)
- Chen-Yu Fan
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | | | | | | | | | | |
Collapse
|
16
|
Popescu GV, Popescu SC. Complexity and Modularity of MAPK Signaling Networks. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Signaling through mitogen-activated protein kinase (MAPK) cascades is a conserved and fundamental process in all eukaryotes. This chapter reviews recent progress made in the identification of components of MAPK signaling networks using novel large scale experimental methods. It also presents recent landmarks in the computational modeling and simulation of the dynamics of MAPK signaling modules. The in vitro MAPK signaling network reconstructed from predicted phosphorylation events is dense, supporting the hypothesis of a combinatorial control of transcription through selective phosphorylation of sets of transcription factors. Despite the fact that additional co-factors and scaffold proteins may regulate the dynamics of signal transduction in vivo, the complexity of MAPK signaling networks supports a new model that departs significantly from that of the classical definition of a MAPK cascade.
Collapse
|
17
|
Lin X, Chen XW. Heterogeneous data integration by tree-augmented naïve Bayes for protein-protein interactions prediction. Proteomics 2012; 13:261-8. [DOI: 10.1002/pmic.201200326] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2012] [Revised: 09/23/2012] [Accepted: 10/11/2012] [Indexed: 11/08/2022]
Affiliation(s)
- Xiaotong Lin
- Department of Electrical Engineering and Computer Science; The University of Kansas; Lawrence; KS; USA
| | - Xue-wen Chen
- Department of Computer Science; Wayne State University; Detroit; MI; USA
| |
Collapse
|
18
|
Navlakha S, Gitter A, Bar-Joseph Z. A network-based approach for predicting missing pathway interactions. PLoS Comput Biol 2012; 8:e1002640. [PMID: 22916002 PMCID: PMC3420932 DOI: 10.1371/journal.pcbi.1002640] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Accepted: 06/26/2012] [Indexed: 02/03/2023] Open
Abstract
Embedded within large-scale protein interaction networks are signaling pathways that encode response cascades in the cell. Unfortunately, even for well-studied species like S. cerevisiae, only a fraction of all true protein interactions are known, which makes it difficult to reason about the exact flow of signals and the corresponding causal relations in the network. To help address this problem, we introduce a framework for predicting new interactions that aid connectivity between upstream proteins (sources) and downstream transcription factors (targets) of a particular pathway. Our algorithms attempt to globally minimize the distance between sources and targets by finding a small set of shortcut edges to add to the network. Unlike existing algorithms for predicting general protein interactions, by focusing on proteins involved in specific responses our approach homes-in on pathway-consistent interactions. We applied our method to extend pathways in osmotic stress response in yeast and identified several missing interactions, some of which are supported by published reports. We also performed experiments that support a novel interaction not previously reported. Our framework is general and may be applicable to edge prediction problems in other domains.
Collapse
Affiliation(s)
- Saket Navlakha
- School of Computer Science and Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Anthony Gitter
- School of Computer Science and Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Ziv Bar-Joseph
- School of Computer Science and Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
19
|
Tang YT, Kao HY. Augmented transitive relationships with high impact protein distillation in protein interaction prediction. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2012; 1824:1468-75. [PMID: 22683815 DOI: 10.1016/j.bbapap.2012.05.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2012] [Revised: 05/18/2012] [Accepted: 05/30/2012] [Indexed: 11/16/2022]
Abstract
Predicting new protein-protein interactions is important for discovering novel functions of various biological pathways. Predicting these interactions is a crucial and challenging task. Moreover, discovering new protein-protein interactions through biological experiments is still difficult. Therefore, it is increasingly important to discover new protein interactions. Many studies have predicted protein-protein interactions, using biological features such as Gene Ontology (GO) functional annotations and structural domains of two proteins. In this paper, we propose an augmented transitive relationships predictor (ATRP), a new method of predicting potential protein interactions using transitive relationships and annotations of protein interactions. In addition, a distillation of virtual direct protein-protein interactions is proposed to deal with unbalanced distribution of different types of interactions in the existing protein-protein interaction databases. Our results demonstrate that ATRP can effectively predict protein-protein interactions. ATRP achieves an 81% precision, a 74% recall and a 77% F-measure in average rate in the prediction of direct protein-protein interactions. Using the generated benchmark datasets from KUPS to evaluate of all types of the protein-protein interaction, ATRP achieved a 93% precision, a 49% recall and a 64% F-measure in average rate. This article is part of a Special Issue entitled: Computational Methods for Protein Interaction and Structural Prediction.
Collapse
Affiliation(s)
- Yi-Tsung Tang
- Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1, Ta-Hsueh Road, Tainan, Taiwan, ROC
| | | |
Collapse
|
20
|
Partner-aware prediction of interacting residues in protein-protein complexes from sequence data. PLoS One 2011; 6:e29104. [PMID: 22194998 PMCID: PMC3237601 DOI: 10.1371/journal.pone.0029104] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2011] [Accepted: 11/21/2011] [Indexed: 12/22/2022] Open
Abstract
Computational prediction of residues that participate in protein-protein interactions is a difficult task, and state of the art methods have shown only limited success in this arena. One possible problem with these methods is that they try to predict interacting residues without incorporating information about the partner protein, although it is unclear how much partner information could enhance prediction performance. To address this issue, the two following comparisons are of crucial significance: (a) comparison between the predictability of inter-protein residue pairs, i.e., predicting exactly which residue pairs interact with each other given two protein sequences; this can be achieved by either combining conventional single-protein predictions or making predictions using a new model trained directly on the residue pairs, and the performance of these two approaches may be compared: (b) comparison between the predictability of the interacting residues in a single protein (irrespective of the partner residue or protein) from conventional methods and predictions converted from the pair-wise trained model. Using these two streams of training and validation procedures and employing similar two-stage neural networks, we showed that the models trained on pair-wise contacts outperformed the partner-unaware models in predicting both interacting pairs and interacting single-protein residues. Prediction performance decreased with the size of the conformational change upon complex formation; this trend is similar to docking, even though no structural information was used in our prediction. An example application that predicts two partner-specific interfaces of a protein was shown to be effective, highlighting the potential of the proposed approach. Finally, a preliminary attempt was made to score docking decoy poses using prediction of interacting residue pairs; this analysis produced an encouraging result.
Collapse
|
21
|
Assessing the utility of gene co-expression stability in combination with correlation in the analysis of protein-protein interaction networks. BMC Genomics 2011; 12 Suppl 3:S19. [PMID: 22369639 PMCID: PMC3333178 DOI: 10.1186/1471-2164-12-s3-s19] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Background Gene co-expression, in the form of a correlation coefficient, has been valuable in the analysis, classification and prediction of protein-protein interactions. However, it is susceptible to bias from a few samples having a large effect on the correlation coefficient. Gene co-expression stability is a means of quantifying this bias, with high stability indicating robust, unbiased co-expression correlation coefficients. We assess the utility of gene co-expression stability as an additional measure to support the co-expression correlation in the analysis of protein-protein interaction networks. Results We studied the patterns of co-expression correlation and stability in interacting proteins with respect to their interaction promiscuity, levels of intrinsic disorder, and essentiality or disease-relatedness. Co-expression stability, along with co-expression correlation, acts as a better classifier of hub proteins in interaction networks, than co-expression correlation alone, enabling the identification of a class of hubs that are functionally distinct from the widely accepted transient (date) and obligate (party) hubs. Proteins with high levels of intrinsic disorder have low co-expression correlation and high stability with their interaction partners suggesting their involvement in transient interactions, except for a small group that have high co-expression correlation and are typically subunits of stable complexes. Similar behavior was seen for disease-related and essential genes. Interacting proteins that are both disordered have higher co-expression stability than ordered protein pairs. Using co-expression correlation and stability, we found that transient interactions are more likely to occur between an ordered and a disordered protein while obligate interactions primarily occur between proteins that are either both ordered, or disordered. Conclusions We observe that co-expression stability shows distinct patterns in structurally and functionally different groups of proteins and interactions. We conclude that it is a useful and important measure to be used in concert with gene co-expression correlation for further insights into the characteristics of proteins in the context of their interaction network.
Collapse
|
22
|
Huang Y, Sun X, Hu G. An integrated genetics approach for identifying protein signal pathways of Alzheimer's disease. Comput Methods Biomech Biomed Engin 2011; 14:371-8. [PMID: 21442495 DOI: 10.1080/10255842.2010.482525] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Alzheimer's disease (AD) is considered one of the most common age-associated neurodegenerative disorders, affecting millions of senior people worldwide. Combination of protein-protein interaction (PPI) network analysis and gene expression studies provides a better insight into AD. A computational approach was developed in our work to identify protein signal pathways between amyloid precursor proteins and tau proteins, which are well known as important proteins for AD. First, a modified LA-SEN method, called the network-constrained regularisation analysis, was applied to microarray data from a transgenic mouse model and AD patients. Then protein pathways were constructed based on an integer linear programming model to integrate microarray data and the PPI database. Important pathways of AD, including some cancer-related pathways, were identified finally.
Collapse
Affiliation(s)
- Yue Huang
- Biomedical Engineering Department, School of Medicine, Tsinghua University, Beijing, P.R. China
| | | | | |
Collapse
|
23
|
Reddy ASN, Ben-Hur A, Day IS. Experimental and computational approaches for the study of calmodulin interactions. PHYTOCHEMISTRY 2011; 72:1007-19. [PMID: 21338992 DOI: 10.1016/j.phytochem.2010.12.022] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2010] [Revised: 11/10/2010] [Accepted: 12/28/2010] [Indexed: 05/22/2023]
Abstract
Ca(2+), a universal messenger in eukaryotes, plays a major role in signaling pathways that control many growth and developmental processes in plants as well as their responses to various biotic and abiotic stresses. Cellular changes in Ca(2+) in response to diverse signals are recognized by protein sensors that either have their activity modulated or that interact with other proteins and modulate their activity. Calmodulins (CaMs) and CaM-like proteins (CMLs) are Ca(2+) sensors that have no enzymatic activity of their own but upon binding Ca(2+) interact and modulate the activity of other proteins involved in a large number of plant processes. Protein-protein interactions play a key role in Ca(2+)/CaM-mediated in signaling pathways. In this review, using CaM as an example, we discuss various experimental approaches and computational tools to identify protein-protein interactions. During the last two decades hundreds of CaM-binding proteins in plants have been identified using a variety of approaches ranging from simple screening of expression libraries with labeled CaM to high-throughput screens using protein chips. However, the high-throughput methods have not been applied to the entire proteome of any plant system. Nevertheless, the data provided by these screens allows the development of computational tools to predict CaM-interacting proteins. Using all known binding sites of CaM, we developed a computational method that predicted over 700 high confidence CaM interactors in the Arabidopsis proteome. Most (>600) of these are not known to bind calmodulin, suggesting that there are likely many more CaM targets than previously known. Functional analyses of some of the experimentally identified Ca(2+) sensor target proteins have uncovered their precise role in Ca(2+)-mediated processes. Further studies on identifying novel targets of CaM and CMLs and generating their interaction network - "calcium sensor interactome" - will help us in understanding how Ca(2+) regulates a myriad of cellular and physiological processes.
Collapse
Affiliation(s)
- A S N Reddy
- Department of Biology, Program in Molecular Plant Biology, Program in Cell and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA.
| | | | | |
Collapse
|
24
|
Yano K. Improved prediction of protein interaction from microarray data using asymmetric correlation. PROCEDIA COMPUTER SCIENCE 2011; 4:1072-1081. [DOI: 10.1016/j.procs.2011.04.114] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
25
|
González AJ, Liao L. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines. BMC Bioinformatics 2010; 11:537. [PMID: 21034480 PMCID: PMC2989984 DOI: 10.1186/1471-2105-11-537] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2009] [Accepted: 10/29/2010] [Indexed: 11/23/2022] Open
Abstract
Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows.
Collapse
Affiliation(s)
- Alvaro J González
- Department of Computer and Information Sciences, University of Delaware 421 Smith Hall, Newark, DE 19716, USA
| | | |
Collapse
|
26
|
Wang J, Zhou X, Zhu J, Zhou C, Guo Z. Revealing and avoiding bias in semantic similarity scores for protein pairs. BMC Bioinformatics 2010; 11:290. [PMID: 20509916 PMCID: PMC2903568 DOI: 10.1186/1471-2105-11-290] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Accepted: 05/28/2010] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Semantic similarity scores for protein pairs are widely applied in functional genomic researches for finding functional clusters of proteins, predicting protein functions and protein-protein interactions, and for identifying putative disease genes. However, because some proteins, such as those related to diseases, tend to be studied more intensively, annotations are likely to be biased, which may affect applications based on semantic similarity measures. Thus, it is necessary to evaluate the effects of the bias on semantic similarity scores between proteins and then find a method to avoid them. RESULTS First, we evaluated 14 commonly used semantic similarity scores for protein pairs and demonstrated that they significantly correlated with the numbers of annotation terms for the proteins (also known as the protein annotation length). These results suggested that current applications of the semantic similarity scores between proteins might be unreliable. Then, to reduce this annotation bias effect, we proposed normalizing the semantic similarity scores between proteins using the power transformation of the scores. We provide evidence that this improves performance in some applications. CONCLUSIONS Current semantic similarity measures for protein pairs are highly dependent on protein annotation lengths, which are subject to biological research bias. This affects applications that are based on these semantic similarity scores, especially in clustering studies that rely on score magnitudes. The normalized scores proposed in this paper can reduce the effects of this bias to some extent.
Collapse
Affiliation(s)
- Jing Wang
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Xianxiao Zhou
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Jing Zhu
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Chenggui Zhou
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Zheng Guo
- Bioinformatics Centre, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150086, China
| |
Collapse
|
27
|
Yu CY, Chou LC, Chang DTH. Predicting protein-protein interactions in unbalanced data using the primary structure of proteins. BMC Bioinformatics 2010; 11:167. [PMID: 20361868 PMCID: PMC2868006 DOI: 10.1186/1471-2105-11-167] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2009] [Accepted: 04/02/2010] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Elucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary structure. Most of these approaches have achieved satisfactory performance on datasets comprising equal number of interacting and non-interacting protein pairs. However, this ratio is highly unbalanced in nature, and these techniques have not been comprehensively evaluated with respect to the effect of the large number of non-interacting pairs in realistic datasets. Moreover, since highly unbalanced distributions usually lead to large datasets, more efficient predictors are desired when handling such challenging tasks. RESULTS This study presents a method for PPI prediction based only on sequence information, which contributes in three aspects. First, we propose a probability-based mechanism for transforming protein sequences into feature vectors. Second, the proposed predictor is designed with an efficient classification algorithm, where the efficiency is essential for handling highly unbalanced datasets. Third, the proposed PPI predictor is assessed with several unbalanced datasets with different positive-to-negative ratios (from 1:1 to 1:15). This analysis provides solid evidence that the degree of dataset imbalance is important to PPI predictors. CONCLUSIONS Dealing with data imbalance is a key issue in PPI prediction since there are far fewer interacting protein pairs than non-interacting ones. This article provides a comprehensive study on this issue and develops a practical tool that achieves both good prediction performance and efficiency using only protein sequence information.
Collapse
Affiliation(s)
- Chi-Yuan Yu
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei 106, Taiwan
| | - Lih-Ching Chou
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei 106, Taiwan
| | - Darby Tien-Hao Chang
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| |
Collapse
|
28
|
Park Y. Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences. BMC Bioinformatics 2009; 10:419. [PMID: 20003442 PMCID: PMC2803199 DOI: 10.1186/1471-2105-10-419] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2009] [Accepted: 12/14/2009] [Indexed: 11/10/2022] Open
Abstract
Background Protein-protein interactions underlie many important biological processes. Computational prediction methods can nicely complement experimental approaches for identifying protein-protein interactions. Recently, a unique category of sequence-based prediction methods has been put forward - unique in the sense that it does not require homologous protein sequences. This enables it to be universally applicable to all protein sequences unlike many of previous sequence-based prediction methods. If effective as claimed, these new sequence-based, universally applicable prediction methods would have far-reaching utilities in many areas of biology research. Results Upon close survey, I realized that many of these new methods were ill-tested. In addition, newer methods were often published without performance comparison with previous ones. Thus, it is not clear how good they are and whether there are significant performance differences among them. In this study, I have implemented and thoroughly tested 4 different methods on large-scale, non-redundant data sets. It reveals several important points. First, significant performance differences are noted among different methods. Second, data sets typically used for training prediction methods appear significantly biased, limiting the general applicability of prediction methods trained with them. Third, there is still ample room for further developments. In addition, my analysis illustrates the importance of complementary performance measures coupled with right-sized data sets for meaningful benchmark tests. Conclusions The current study reveals the potentials and limits of the new category of sequence-based protein-protein interaction prediction methods, which in turn provides a firm ground for future endeavours in this important area of contemporary bioinformatics.
Collapse
Affiliation(s)
- Yungki Park
- Institute of Cellular and Molecular Biology (MBB 3 210B), Center for Systems and Synthetic Biology, University of Texas at Austin, 2500 Speedway, Austin, Texas, USA.
| |
Collapse
|
29
|
Integrating diverse information to gain more insight into microarray analysis. J Biomed Biotechnol 2009; 2009:648987. [PMID: 19834567 PMCID: PMC2761008 DOI: 10.1155/2009/648987] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2008] [Revised: 06/23/2009] [Accepted: 07/17/2009] [Indexed: 11/17/2022] Open
Abstract
Microarray technology provides an opportunity to view transcriptions at genomic level under different conditions controlled by an experiment. From an array experiment using a human cancer cell line that is engineered to differ in expression of tumor antigen, integrin alpha6beta4, few hundreds of differentially expressed genes are selected and are clustered using one of several standard algorithms. The set of genes in a cluster is expected to have similar expression patterns and are most likely to be coregulated and thereby expected to have similar function. The highly expressed set of upregulated genes become candidates for further evaluation as potential biomarkers. Besides these benefits, microarray experiment by itself does not help us to understand or discover potential pathways or to identify important set of genes for potential drug targets. In this paper we discuss about integrating protein-to-protein interaction information, pathway information with array expression data set to identify a set of "important" genes, and potential signal transduction networks that help to target and reverse the oncogenic phenotype induced by tumor antigen such as integrin alpha6beta4. We will illustrate the proposed method with our recent microarray experiment conducted for identifying transcriptional targets of integrin alpha6beta4 for cancer progression.
Collapse
|