1
|
Rimal P, Panday SK, Xu W, Peng Y, Alexov E. SAAMBE-MEM: a sequence-based method for predicting binding free energy change upon mutation in membrane protein-protein complexes. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae544. [PMID: 39240325 PMCID: PMC11407696 DOI: 10.1093/bioinformatics/btae544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 08/04/2024] [Accepted: 09/04/2024] [Indexed: 09/07/2024]
Abstract
MOTIVATION Mutations in protein-protein interactions can affect the corresponding complexes, impacting function and potentially leading to disease. Given the abundance of membrane proteins, it is crucial to assess the impact of mutations on the binding affinity of these proteins. Although several methods exist to predict the binding free energy change due to mutations in protein-protein complexes, most require structural information of the protein complex and are primarily trained on the SKEMPI database, which is composed mainly of soluble proteins. RESULTS A novel sequence-based method (SAAMBE-MEM) for predicting binding free energy changes (ΔΔG) in membrane protein-protein complexes due to mutations has been developed. This method utilized the MPAD database, which contains binding affinities for wild-type and mutant membrane protein complexes. A machine learning model was developed to predict ΔΔG by leveraging features such as amino acid indices and position-specific scoring matrices (PSSM). Through extensive dataset curation and feature extraction, SAAMBE-MEM was trained and validated using the XGBoost regression algorithm. The optimal feature set, including PSSM-related features, achieved a Pearson correlation coefficient of 0.64, outperforming existing methods trained on the SKEMPI database. Furthermore, it was demonstrated that SAAMBE-MEM performs much better when utilizing evolution-based features in contrast to physicochemical features. AVAILABILITY AND IMPLEMENTATION The method is accessible via a web server and standalone code at http://compbio.clemson.edu/SAAMBE-MEM/. The cleaned MPAD database is available at the website.
Collapse
Affiliation(s)
- Prawin Rimal
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, United States
| | - Shailesh Kumar Panday
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, United States
| | - Wang Xu
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, Hubei 430079, China
| | - Yunhui Peng
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, Hubei 430079, China
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, United States
| |
Collapse
|
2
|
Guan H, Qiu W, Liu H, Cao Y, Tian L, Huang P, Hou D, Zhang G. Study on the detection method of biological characteristics of hepatoma cells based on terahertz time-domain spectroscopy. BIOMEDICAL OPTICS EXPRESS 2023; 14:5781-5794. [PMID: 38021130 PMCID: PMC10659802 DOI: 10.1364/boe.495600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 09/09/2023] [Accepted: 10/04/2023] [Indexed: 12/01/2023]
Abstract
Liver cancer usually has a high degree of malignancy and its early symptoms are hidden, therefore, it is of significant research value to develop early-stage detection methods of liver cancer for pathological screening. In this paper, a biometric detection method for living human hepatocytes based on terahertz time-domain spectroscopy was proposed. The difference in terahertz response between normal and cancer cells was analyzed, including five characteristic parameters in the response, namely refractive index, absorption coefficient, dielectric constant, dielectric loss and dielectric loss tangent. Based on class separability and variable correlation, absorption coefficient and dielectric loss were selected to better characterize cellular properties. Maximum information coefficient and principal component analysis were employed for feature extraction, and a cell classification model of support vector machine was constructed. The results showed that the algorithm based on parameter feature fusion can achieve an accuracy of 91.6% for human hepatoma cell lines and one normal cell line. This work provides a promising solution for the qualitative evaluation of living cells in liquid environment.
Collapse
Affiliation(s)
- Hanxiao Guan
- State Key Laboratory of Industrial Control
Technology, College of Control Science and Engineering,
Zhejiang University,
Hangzhou, 310000, China
| | - Weihang Qiu
- College of Biomedical Engineering and
Instrument Science, Zhejiang University,
Hangzhou, 310000, China
| | - Heng Liu
- State Key Laboratory of Industrial Control
Technology, College of Control Science and Engineering,
Zhejiang University,
Hangzhou, 310000, China
| | - Yuqi Cao
- State Key Laboratory of Industrial Control
Technology, College of Control Science and Engineering,
Zhejiang University,
Hangzhou, 310000, China
| | - Liangfei Tian
- College of Biomedical Engineering and
Instrument Science, Zhejiang University,
Hangzhou, 310000, China
| | - Pingjie Huang
- State Key Laboratory of Industrial Control
Technology, College of Control Science and Engineering,
Zhejiang University,
Hangzhou, 310000, China
| | - Dibo Hou
- State Key Laboratory of Industrial Control
Technology, College of Control Science and Engineering,
Zhejiang University,
Hangzhou, 310000, China
| | - Guangxin Zhang
- State Key Laboratory of Industrial Control
Technology, College of Control Science and Engineering,
Zhejiang University,
Hangzhou, 310000, China
| |
Collapse
|
3
|
Large-scale prediction of key dynamic interacting proteins in multiple cancers. Int J Biol Macromol 2022; 220:1124-1132. [PMID: 36027989 DOI: 10.1016/j.ijbiomac.2022.08.125] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 08/15/2022] [Accepted: 08/17/2022] [Indexed: 11/21/2022]
Abstract
Tracking cancer dynamic protein-protein interactions (PPIs) and deciphering their pathogenesis remain a challenge. We presented a dynamic PPIs' hypothesis: permanent and transient interactions might achieve dynamic switchings from normal cells to malignancy, which could cause maintenance functions to be interrupted and transient functions to be sustained. Based on the hypothesis, we first predicted >1400 key cancer genes (KCG) by applying PPI-express we proposed to 18 cancer gene expression datasets. We then further screened out key dynamic interactions (KDI) of cancer based on KCG and transient and permanent interactions under both conditions. Two prominent functional characteristics, "Cell cycle-related" and "Immune-related", were presented for KCG, suggesting that these might be their general characteristics. We found that, compared to permanent to transient KDI pairs (P2T) in the network, transient to permanent (T2P) have significantly higher edge betweenness (EB), and P2T pairs tending to locate intra-functional modules may play roles in maintaining normal biological functions, while T2P KDI pairs tending to locate inter-modules may play roles in biological signal transduction. It was consistent with our hypothesis. Also, we analyzed network characteristics of KDI pairs and their functions. Our findings of KDI may serve to understand and explain a few hallmarks of cancer.
Collapse
|
4
|
Abstract
Maximal information coefficient (MIC) explores the associations between pairwise variables in complex relationships. It approaches the correlation by optimized partition on the axis. However, when the relationships meet special noise, MIC may overestimate the correlated value, which leads to the misidentification of the relationship without noiseless. In this article, a novel method of weighted information coefficient mean (WICM) is proposed to detect unbiased associations in large data sets. First, we mathematically analyze the cause of giving an abnormal correlation value to a noisy relationship. Then, the WICM is presented in two core steps. One is to detect the potential overestimation from the relationships with high value, and the other is to rectify the overestimation by calculating information coefficient mean instead of just selecting the maximum element in the characteristic matrix. Finally, experiments in functional relationships and real-world data relationships show that the overestimation can be solved by WICM with both feasibility and effectiveness.
Collapse
Affiliation(s)
- Chuanlu Liu
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Shuliang Wang
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Institute of E-Government, Beijing Institute of Technology, Beijing, China
| | - Hanning Yuan
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Xiaojia Liu
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
5
|
Li G, Pahari S, Murthy AK, Liang S, Fragoza R, Yu H, Alexov E. SAAMBE-SEQ: a sequence-based method for predicting mutation effect on protein-protein binding affinity. Bioinformatics 2021; 37:992-999. [PMID: 32866236 PMCID: PMC8128451 DOI: 10.1093/bioinformatics/btaa761] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 08/17/2020] [Accepted: 08/24/2020] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Vast majority of human genetic disorders are associated with mutations that affect protein-protein interactions by altering wild-type binding affinity. Therefore, it is extremely important to assess the effect of mutations on protein-protein binding free energy to assist the development of therapeutic solutions. Currently, the most popular approaches use structural information to deliver the predictions, which precludes them to be applicable on genome-scale investigations. Indeed, with the progress of genomic sequencing, researchers are frequently dealing with assessing effect of mutations for which there is no structure available. RESULTS Here, we report a Gradient Boosting Decision Tree machine learning algorithm, the SAAMBE-SEQ, which is completely sequence-based and does not require structural information at all. SAAMBE-SEQ utilizes 80 features representing evolutionary information, sequence-based features and change of physical properties upon mutation at the mutation site. The approach is shown to achieve Pearson correlation coefficient (PCC) of 0.83 in 5-fold cross validation in a benchmarking test against experimentally determined binding free energy change (ΔΔG). Further, a blind test (no-STRUC) is compiled collecting experimental ΔΔG upon mutation for protein complexes for which structure is not available and used to benchmark SAAMBE-SEQ resulting in PCC in the range of 0.37-0.46. The accuracy of SAAMBE-SEQ method is found to be either better or comparable to most advanced structure-based methods. SAAMBE-SEQ is very fast, available as webserver and stand-alone code, and indeed utilizes only sequence information, and thus it is applicable for genome-scale investigations to study the effect of mutations on protein-protein interactions. AVAILABILITY AND IMPLEMENTATION SAAMBE-SEQ is available at http://compbio.clemson.edu/saambe_webserver/indexSEQ.php#started. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gen Li
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Swagata Pahari
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | | | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA
| | - Robert Fragoza
- Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| |
Collapse
|
6
|
Cáceres JJ, Paccanaro A. Disease gene prediction for molecularly uncharacterized diseases. PLoS Comput Biol 2019; 15:e1007078. [PMID: 31276496 PMCID: PMC6636748 DOI: 10.1371/journal.pcbi.1007078] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 07/17/2019] [Accepted: 05/09/2019] [Indexed: 02/06/2023] Open
Abstract
Network medicine approaches have been largely successful at increasing our knowledge of molecularly characterized diseases. Given a set of disease genes associated with a disease, neighbourhood-based methods and random walkers exploit the interactome allowing the prediction of further genes for that disease. In general, however, diseases with no known molecular basis constitute a challenge. Here we present a novel network approach to prioritize gene-disease associations that is able to also predict genes for diseases with no known molecular basis. Our method, which we have called Cardigan (ChARting DIsease Gene AssociatioNs), uses semi-supervised learning and exploits a measure of similarity between disease phenotypes. We evaluated its performance at predicting genes for both molecularly characterized and uncharacterized diseases in OMIM, using both weighted and binary interactomes, and compared it with state-of-the-art methods. Our tests, which use datasets collected at different points in time to replicate the dynamics of the disease gene discovery process, prove that Cardigan is able to accurately predict disease genes for molecularly uncharacterized diseases. Additionally, standard leave-one-out cross validation tests show how our approach outperforms state-of-the-art methods at predicting genes for molecularly characterized diseases by 14%-65%. Cardigan can also be used for disease module prediction, where it outperforms state-of-the-art methods by 87%-299%.
Collapse
Affiliation(s)
- Juan J. Cáceres
- Centre for Systems and Synthetic Biology & Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
| | - Alberto Paccanaro
- Centre for Systems and Synthetic Biology & Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
- * E-mail:
| |
Collapse
|
7
|
Bing X, Bunea F, Royer M, Das J. Latent Model-Based Clustering for Biological Discovery. iScience 2019; 14:125-135. [PMID: 30954780 PMCID: PMC6449745 DOI: 10.1016/j.isci.2019.03.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 11/26/2018] [Accepted: 03/18/2019] [Indexed: 11/27/2022] Open
Abstract
LOVE, a robust, scalable latent model-based clustering method for biological discovery, can be used across a range of datasets to generate both overlapping and non-overlapping clusters. In our formulation, a cluster comprises variables associated with the same latent factor and is determined from an allocation matrix that indexes our latent model. We prove that the allocation matrix and corresponding clusters are uniquely defined. We apply LOVE to biological datasets (gene expression, serological responses measured from HIV controllers and chronic progressors, vaccine-induced humoral immune responses) resulting in meaningful biological output. For all three datasets, the clusters generated by LOVE remain stable across tuning parameters. Finally, we compared LOVE's performance to that of 13 state-of-the-art methods using previously established benchmarks and found that LOVE outperformed these methods across datasets. Our results demonstrate that LOVE can be broadly used across large-scale biological datasets to generate accurate and meaningful overlapping and non-overlapping clusters. LOVE is a robust, scalable, and versatile latent model-based clustering method Has theoretical guarantees, and can generate overlapping and non-overlapping clusters Generates meaningful clusters from datasets spanning a range of biological domains Using established benchmarks, outperforms 13 state-of-the-art methods across datasets
Collapse
|
8
|
Janani S, Ramyachitra D, Ranjani Rani R. PCD-DPPI: Protein complex detection from dynamic PPI using shuffled frog-leaping algorithm. GENE REPORTS 2018. [DOI: 10.1016/j.genrep.2018.06.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
9
|
Wuchty S, Boltz T, Küçük-McGinty H. Links between critical proteins drive the controllability of protein interaction networks. Proteomics 2017; 17:e1700056. [DOI: 10.1002/pmic.201700056] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Revised: 03/23/2017] [Accepted: 04/07/2017] [Indexed: 01/03/2023]
Affiliation(s)
- Stefan Wuchty
- Department of Computer Science; University of Miami; Coral Gables FL USA
- Center of Computational Sciences; University of Miami; Coral Gables FL USA
- Sylvester Comprehensive Cancer Center; University of Miami; Miami FL USA
| | - Toni Boltz
- Department of Computer Science; University of Miami; Coral Gables FL USA
| | | |
Collapse
|
10
|
Guo Y, Alexander K, Clark AG, Grimson A, Yu H. Integrated network analysis reveals distinct regulatory roles of transcription factors and microRNAs. RNA (NEW YORK, N.Y.) 2016; 22:1663-1672. [PMID: 27604961 PMCID: PMC5066619 DOI: 10.1261/rna.048025.114] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 07/25/2016] [Indexed: 06/06/2023]
Abstract
Analysis of transcription regulatory networks has revealed many principal features that govern gene expression regulation. MicroRNAs (miRNAs) have emerged as another major class of gene regulators that influence gene expression post-transcriptionally, but there remains a need to assess quantitatively their global roles in gene regulation. Here, we have constructed an integrated gene regulatory network comprised of transcription factors (TFs), miRNAs, and their target genes and analyzed the effect of regulation on target mRNA expression, target protein expression, protein-protein interaction, and disease association. We found that while target genes regulated by the same TFs tend to be co-expressed, co-regulation by miRNAs does not lead to co-expression assessed at either mRNA or protein levels. Analysis of interacting protein pairs in the regulatory network revealed that compared to genes co-regulated by miRNAs, a higher fraction of genes co-regulated by TFs encode proteins in the same complex. Although these results suggest that genes co-regulated by TFs are more functionally related than those co-regulated by miRNAs, genes that share either TF or miRNA regulators are more likely to cause the same disease. Further analysis on the interplay between TFs and miRNAs suggests that TFs tend to regulate intramodule/pathway clusters, while miRNAs tend to regulate intermodule/pathway clusters. These results demonstrate that although TFs and miRNAs both regulate gene expression, they occupy distinct niches in the overall regulatory network within the cell.
Collapse
Affiliation(s)
- Yu Guo
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York 14853, USA
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Katherine Alexander
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Andrew G Clark
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Andrew Grimson
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Haiyuan Yu
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York 14853, USA
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
11
|
Ou-Yang L, Zhang XF, Dai DQ, Wu MY, Zhu Y, Liu Z, Yan H. Protein complex detection based on partially shared multi-view clustering. BMC Bioinformatics 2016; 17:371. [PMID: 27623844 PMCID: PMC5022186 DOI: 10.1186/s12859-016-1164-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 07/23/2016] [Indexed: 01/05/2023] Open
Abstract
Background Protein complexes are the key molecular entities to perform many essential biological functions. In recent years, high-throughput experimental techniques have generated a large amount of protein interaction data. As a consequence, computational analysis of such data for protein complex detection has received increased attention in the literature. However, most existing works focus on predicting protein complexes from a single type of data, either physical interaction data or co-complex interaction data. These two types of data provide compatible and complementary information, so it is necessary to integrate them to discover the underlying structures and obtain better performance in complex detection. Results In this study, we propose a novel multi-view clustering algorithm, called the Partially Shared Multi-View Clustering model (PSMVC), to carry out such an integrated analysis. Unlike traditional multi-view learning algorithms that focus on mining either consistent or complementary information embedded in the multi-view data, PSMVC can jointly explore the shared and specific information inherent in different views. In our experiments, we compare the complexes detected by PSMVC from single data source with those detected from multiple data sources. We observe that jointly analyzing multi-view data benefits the detection of protein complexes. Furthermore, extensive experiment results demonstrate that PSMVC performs much better than 16 state-of-the-art complex detection techniques, including ensemble clustering and data integration techniques. Conclusions In this work, we demonstrate that when integrating multiple data sources, using partially shared multi-view clustering model can help to identify protein complexes which are not readily identifiable by conventional single-view-based methods and other integrative analysis methods. All the results and source codes are available on https://github.com/Oyl-CityU/PSMVC. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1164-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Le Ou-Yang
- College of Information Engineering, Shenzhen University, Nanhai Ave 3688, Shenzhen, 518060, China.,Department of Electronic and Engineering, City University of Hong Kong, Tat Chee Avenue, Hong Kong, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics and Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China
| | - Dao-Qing Dai
- Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Xin Gang Road West, Guangzhou, 510275, China.
| | - Meng-Yun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Guoding Road, Shanghai, 200433, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, China
| | - Zhiyong Liu
- Shenzhen Polytechnic, Shenzhen, 518055, China
| | - Hong Yan
- Department of Electronic and Engineering, City University of Hong Kong, Tat Chee Avenue, Hong Kong, China
| |
Collapse
|
12
|
|
13
|
Lakizadeh A, Jalili S. BiCAMWI: A Genetic-Based Biclustering Algorithm for Detecting Dynamic Protein Complexes. PLoS One 2016; 11:e0159923. [PMID: 27462706 PMCID: PMC4963120 DOI: 10.1371/journal.pone.0159923] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2016] [Accepted: 07/11/2016] [Indexed: 01/08/2023] Open
Abstract
Considering the roles of protein complexes in many biological processes in the cell, detection of protein complexes from available protein-protein interaction (PPI) networks is a key challenge in the post genome era. Despite high dynamicity of cellular systems and dynamic interaction between proteins in a cell, most computational methods have focused on static networks which cannot represent the inherent dynamicity of protein interactions. Recently, some researchers try to exploit the dynamicity of PPI networks by constructing a set of dynamic PPI subnetworks correspondent to each time-point (column) in a gene expression data. However, many genes can participate in multiple biological processes and cellular processes are not necessarily related to every sample, but they might be relevant only for a subset of samples. So, it is more interesting to explore each subnetwork based on a subset of genes and conditions (i.e., biclusters) in a gene expression data. Here, we present a new method, called BiCAMWI to employ dynamicity in detecting protein complexes. The preprocessing phase of the proposed method is based on a novel genetic algorithm that extracts some sets of genes that are co-regulated under some conditions from input gene expression data. Each extracted gene set is called bicluster. In the detection phase of the proposed method, then, based on the biclusters, some dynamic PPI subnetworks are extracted from input static PPI network. Protein complexes are identified by applying a detection method on each dynamic PPI subnetwork and aggregating the results. Experimental results confirm that BiCAMWI effectively models the dynamicity inherent in static PPI networks and achieves significantly better results than state-of-the-art methods. So, we suggest BiCAMWI as a more reliable method for protein complex detection.
Collapse
Affiliation(s)
- Amir Lakizadeh
- Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
| | - Saeed Jalili
- Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
14
|
Chen Y, Zeng Y, Luo F, Yuan Z. A New Algorithm to Optimize Maximal Information Coefficient. PLoS One 2016; 11:e0157567. [PMID: 27333001 PMCID: PMC4917098 DOI: 10.1371/journal.pone.0157567] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2016] [Accepted: 06/01/2016] [Indexed: 11/25/2022] Open
Abstract
The maximal information coefficient (MIC) captures dependences between paired variables, including both functional and non-functional relationships. In this paper, we develop a new method, ChiMIC, to calculate the MIC values. The ChiMIC algorithm uses the chi-square test to terminate grid optimization and then removes the restriction of maximal grid size limitation of original ApproxMaxMI algorithm. Computational experiments show that ChiMIC algorithm can maintain same MIC values for noiseless functional relationships, but gives much smaller MIC values for independent variables. For noise functional relationship, the ChiMIC algorithm can reach the optimal partition much faster. Furthermore, the MCN values based on MIC calculated by ChiMIC can capture the complexity of functional relationships in a better way, and the statistical powers of MIC calculated by ChiMIC are higher than those calculated by ApproxMaxMI. Moreover, the computational costs of ChiMIC are much less than those of ApproxMaxMI. We apply the MIC values tofeature selection and obtain better classification accuracy using features selected by the MIC values from ChiMIC.
Collapse
Affiliation(s)
- Yuan Chen
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China
| | - Ying Zeng
- Orient Science &Technology College of Hunan Agricultural University, Changsha, China
| | - Feng Luo
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China
- College of Plant Protection, Hunan Agricultural University, Changsha, China
- School of Computing, Clemson University, Clemson, South Carolina, United States of America
| | - Zheming Yuan
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China
- Hunan Provincial Key Laboratory for Germplasm Innovation and Utilization of Crop, Hunan Agricultural University, Changsha, China
| |
Collapse
|
15
|
Xie T, Yang QY, Wang XT, McLysaght A, Zhang HY. Spatial Colocalization of Human Ohnolog Pairs Acts to Maintain Dosage-Balance. Mol Biol Evol 2016; 33:2368-75. [PMID: 27297469 PMCID: PMC4989111 DOI: 10.1093/molbev/msw108] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Ohnologs -paralogous gene pairs generated by whole genome duplication- are enriched for dosage sensitive genes, that is, genes that have a phenotype due to copy number changes. Dosage sensitive genes frequently occur in the same metabolic pathway and in physically interacting proteins. Accumulating evidence reveals that functionally related genes tend to co-localize in the three-dimensional (3D) arrangement of chromosomes. We query whether the spatial distribution of ohnologs has implications for their dosage balance. We analyzed the colocalization frequency of ohnologs based on chromatin interaction datasets of seven human cell lines and found that ohnolog pairs exhibit higher spatial proximity in 3D nuclear organization than other paralog pairs and than randomly chosen ohnologs in the genome. We also found that colocalized ohnologs are more resistant to copy number variations and more likely to be disease-associated genes, which indicates a stronger dosage balance in ohnologs with high spatial proximity. This phenomenon is further supported by the stronger similarity of gene co-expression and of gene ontology terms of colocalized ohnologs. In addition, for a large fraction of ohnologs, the spatial colocalization is conserved in mouse cells, suggestive of functional constraint on their 3D positioning in the nucleus.
Collapse
Affiliation(s)
- Ting Xie
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan P. R. China
| | - Qing-Yong Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan P. R. China
| | - Xiao-Tao Wang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan P. R. China
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Hong-Yu Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan P. R. China
| |
Collapse
|
16
|
Riccadonna S, Jurman G, Visintainer R, Filosi M, Furlanello C. DTW-MIC Coexpression Networks from Time-Course Data. PLoS One 2016; 11:e0152648. [PMID: 27031641 PMCID: PMC4816347 DOI: 10.1371/journal.pone.0152648] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Accepted: 03/17/2016] [Indexed: 01/01/2023] Open
Abstract
When modeling coexpression networks from high-throughput time course data, Pearson Correlation Coefficient (PCC) is one of the most effective and popular similarity functions. However, its reliability is limited since it cannot capture non-linear interactions and time shifts. Here we propose to overcome these two issues by employing a novel similarity function, Dynamic Time Warping Maximal Information Coefficient (DTW-MIC), combining a measure taking care of functional interactions of signals (MIC) and a measure identifying time lag (DTW). By using the Hamming-Ipsen-Mikhailov (HIM) metric to quantify network differences, the effectiveness of the DTW-MIC approach is demonstrated on a set of four synthetic and one transcriptomic datasets, also in comparison to TimeDelay ARACNE and Transfer Entropy.
Collapse
Affiliation(s)
| | - Giuseppe Jurman
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy
| | - Roberto Visintainer
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy
| | - Michele Filosi
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy
| | - Cesare Furlanello
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy
| |
Collapse
|
17
|
Ou-Yang L, Dai DQ, Zhang XF. Detecting Protein Complexes from Signed Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1333-1344. [PMID: 26671805 DOI: 10.1109/tcbb.2015.2401014] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Identification of protein complexes is fundamental for understanding the cellular functional organization. With the accumulation of physical protein-protein interaction (PPI) data, computational detection of protein complexes from available PPI networks has drawn a lot of attentions. While most of the existing protein complex detection algorithms focus on analyzing the physical protein-protein interaction network, none of them take into account the "signs" (i.e., activation-inhibition relationships) of physical interactions. As the "signs" of interactions reflect the way proteins communicate, considering the "signs" of interactions can not only increase the accuracy of protein complex identification, but also deepen our understanding of the mechanisms of cell functions. In this study, we proposed a novel Signed Graph regularized Nonnegative Matrix Factorization (SGNMF) model to identify protein complexes from signed PPI networks. In our experiments, we compared the results collected by our model on signed PPI networks with those predicted by the state-of-the-art complex detection techniques on the original unsigned PPI networks. We observed that considering the "signs" of interactions significantly benefits the detection of protein complexes. Furthermore, based on the predicted complexes, we predicted a set of signed complex-complex interactions for each dataset, which provides a novel insight of the higher level organization of the cell. All the experimental results and codes can be downloaded from http://mail.sysu.edu.cn/home/stsddq@mail.sysu.edu.cn/dai/others/SGNMF.zip.
Collapse
|
18
|
Lakizadeh A, Jalili S, Marashi SA. PCD-GED: Protein complex detection considering PPI dynamics based on time series gene expression data. J Theor Biol 2015; 378:31-8. [DOI: 10.1016/j.jtbi.2015.04.020] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2014] [Revised: 04/14/2015] [Accepted: 04/17/2015] [Indexed: 11/16/2022]
|
19
|
Jeanquartier F, Jean-Quartier C, Holzinger A. Integrated web visualizations for protein-protein interaction databases. BMC Bioinformatics 2015; 16:195. [PMID: 26077899 PMCID: PMC4466863 DOI: 10.1186/s12859-015-0615-z] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Accepted: 05/15/2015] [Indexed: 12/27/2022] Open
Abstract
Background Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. Results We selected M =10 out of N =53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. Conclusions Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.
Collapse
Affiliation(s)
- Fleur Jeanquartier
- Research Unit HCI-KDD, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Auenbruggerplatz 2/V, Graz, 8036, Austria.
| | - Claire Jean-Quartier
- Research Unit HCI-KDD, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Auenbruggerplatz 2/V, Graz, 8036, Austria.
| | - Andreas Holzinger
- Research Unit HCI-KDD, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Auenbruggerplatz 2/V, Graz, 8036, Austria. .,Institute for Information Systems & Computer Media Graz University of Technology, Inffeldgasse 16c, Graz, 8010, Austria.
| |
Collapse
|
20
|
Ma X, Gao L, Karamanlidis G, Gao P, Lee CF, Garcia-Menendez L, Tian R, Tan K. Revealing Pathway Dynamics in Heart Diseases by Analyzing Multiple Differential Networks. PLoS Comput Biol 2015; 11:e1004332. [PMID: 26083688 PMCID: PMC4471235 DOI: 10.1371/journal.pcbi.1004332] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Accepted: 05/12/2015] [Indexed: 02/02/2023] Open
Abstract
Development of heart diseases is driven by dynamic changes in both the activity and connectivity of gene pathways. Understanding these dynamic events is critical for understanding pathogenic mechanisms and development of effective treatment. Currently, there is a lack of computational methods that enable analysis of multiple gene networks, each of which exhibits differential activity compared to the network of the baseline/healthy condition. We describe the iMDM algorithm to identify both unique and shared gene modules across multiple differential co-expression networks, termed M-DMs (multiple differential modules). We applied iMDM to a time-course RNA-Seq dataset generated using a murine heart failure model generated on two genotypes. We showed that iMDM achieves higher accuracy in inferring gene modules compared to using single or multiple co-expression networks. We found that condition-specific M-DMs exhibit differential activities, mediate different biological processes, and are enriched for genes with known cardiovascular phenotypes. By analyzing M-DMs that are present in multiple conditions, we revealed dynamic changes in pathway activity and connectivity across heart failure conditions. We further showed that module dynamics were correlated with the dynamics of disease phenotypes during the development of heart failure. Thus, pathway dynamics is a powerful measure for understanding pathogenesis. iMDM provides a principled way to dissect the dynamics of gene pathways and its relationship to the dynamics of disease phenotype. With the exponential growth of omics data, our method can aid in generating systems-level insights into disease progression.
Collapse
Affiliation(s)
- Xiaoke Ma
- Department of Internal Medicine, University of Iowa, Iowa City, Iowa, United States of America
| | - Long Gao
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa, United States of America
| | - Georgios Karamanlidis
- Department of Anesthesiology and Pain Medicine, Mitochondria and Metabolism Center, University of Washington School of Medicine, Seattle, Washington, United States of America
| | - Peng Gao
- Department of Internal Medicine, University of Iowa, Iowa City, Iowa, United States of America
| | - Chi Fung Lee
- Department of Anesthesiology and Pain Medicine, Mitochondria and Metabolism Center, University of Washington School of Medicine, Seattle, Washington, United States of America
| | - Lorena Garcia-Menendez
- Department of Anesthesiology and Pain Medicine, Mitochondria and Metabolism Center, University of Washington School of Medicine, Seattle, Washington, United States of America
| | - Rong Tian
- Department of Anesthesiology and Pain Medicine, Mitochondria and Metabolism Center, University of Washington School of Medicine, Seattle, Washington, United States of America
| | - Kai Tan
- Department of Internal Medicine, University of Iowa, Iowa City, Iowa, United States of America
| |
Collapse
|
21
|
Das J, Gayvert KM, Bunea F, Wegkamp MH, Yu H. ENCAPP: elastic-net-based prognosis prediction and biomarker discovery for human cancers. BMC Genomics 2015; 16:263. [PMID: 25887568 PMCID: PMC4392808 DOI: 10.1186/s12864-015-1465-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Accepted: 03/13/2015] [Indexed: 02/08/2023] Open
Abstract
Background With the explosion of genomic data over the last decade, there has been a tremendous amount of effort to understand the molecular basis of cancer using informatics approaches. However, this has proven to be extremely difficult primarily because of the varied etiology and vast genetic heterogeneity of different cancers and even within the same cancer. One particularly challenging problem is to predict prognostic outcome of the disease for different patients. Results Here, we present ENCAPP, an elastic-net-based approach that combines the reference human protein interactome network with gene expression data to accurately predict prognosis for different human cancers. Our method identifies functional modules that are differentially expressed between patients with good and bad prognosis and uses these to fit a regression model that can be used to predict prognosis for breast, colon, rectal, and ovarian cancers. Using this model, ENCAPP can also identify prognostic biomarkers with a high degree of confidence, which can be used to generate downstream mechanistic and therapeutic insights. Conclusion ENCAPP is a robust method that can accurately predict prognostic outcome and identify biomarkers for different human cancers. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1465-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, 335 Weill Hall, Ithaca, NY, 14853, USA. .,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA.
| | - Kaitlyn M Gayvert
- Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY, 10065, USA.
| | - Florentina Bunea
- Department of Statistical Science, Cornell University, Ithaca, NY, 14853, USA.
| | - Marten H Wegkamp
- Department of Statistical Science, Cornell University, Ithaca, NY, 14853, USA.
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology, Cornell University, 335 Weill Hall, Ithaca, NY, 14853, USA. .,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA.
| |
Collapse
|
22
|
Das J, Gayvert KM, Yu H. Predicting cancer prognosis using functional genomics data sets. Cancer Inform 2014; 13:85-8. [PMID: 25392695 PMCID: PMC4218897 DOI: 10.4137/cin.s14064] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2014] [Revised: 09/17/2014] [Accepted: 09/19/2014] [Indexed: 11/06/2022] Open
Abstract
Elucidating the molecular basis of human cancers is an extremely complex and challenging task. A wide variety of computational tools and experimental techniques have been used to address different aspects of this characterization. One major hurdle faced by both clinicians and researchers has been to pinpoint the mechanistic basis underlying a wide range of prognostic outcomes for the same type of cancer. Here, we provide an overview of various computational methods that have leveraged different functional genomics data sets to identify molecular signatures that can be used to predict prognostic outcome for various human cancers. Furthermore, we outline challenges that remain and future directions that may be explored to address them.
Collapse
Affiliation(s)
- Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA. ; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| | - Kaitlyn M Gayvert
- Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY, USA
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA. ; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| |
Collapse
|
23
|
Gao S, Karakira I, Afra S, Naji G, Alhajj R, Zeng J, Demetrick D. Evaluating predictive performance of network biomarkers with network structures. J Bioinform Comput Biol 2014; 12:1450025. [DOI: 10.1142/s0219720014500255] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Network is a powerful structure which reveals valuable characteristics of the underlying data. However, previous work on evaluating the predictive performance of network-based biomarkers does not take nodal connectedness into account. We argue that it is necessary to maximize the benefit from the network structure by employing appropriate techniques. To address this, we aim to learn a weight coefficient for each node in the network from the quantitative measure such as gene expression data. The weight coefficients are computed from an optimization problem which minimizes the total weighted difference between nodes in a network structure; this can be expressed in terms of graph Laplacian. After obtaining the coefficient vector for the network markers, we can then compute the corresponding network predictor. We demonstrate the effectiveness of the proposed method by conducting experiments using published breast cancer biomarkers with three patient cohorts. Network markers are first grouped based on GO terms related to cancer hallmarks. We compare the predictive performance of each network marker group across gene expression datasets. We also evaluate the network predictor against the average method for feature aggregation. The reported results show that the predictive performance of network markers is generally not consistent across patient cohorts.
Collapse
Affiliation(s)
- Shang Gao
- College of Computer Science and Technology, Jilin University, Changchun, China
- Department of Computer Science, University of Calgary, 2500 University Drive N. W., Calgary, Alberta, Canada
| | - Ibrahim Karakira
- Department of Computer Science, University of Calgary, 2500 University Drive N. W., Calgary, Alberta, Canada
| | - Salim Afra
- Department of Computer Science, University of Calgary, 2500 University Drive N. W., Calgary, Alberta, Canada
| | - Ghada Naji
- Department of Biology, Lebanese University, Tripoli, Lebanon
| | - Reda Alhajj
- Department of Computer Science, University of Calgary, 2500 University Drive N. W., Calgary, Alberta, Canada
- Department of Computer Science, Global University, Beirut, Lebanon
- Institute of Informatics, Wroclaw University of Technology, Wroclaw, Poland
| | - Jia Zeng
- Institute for Personalized Cancer Therapy, MD Anderson Cancer Center, The University of Texas, 1515 Holcombe Blvd, Houston, Texas, USA
| | - Douglas Demetrick
- Department of Pathology, Oncology and Biochemistry and Molecular Biology, University of Calgary, 3330 Hospital Drive N. W., Calgary, Alberta, Canada
| |
Collapse
|
24
|
Ou-Yang L, Dai DQ, Li XL, Wu M, Zhang XF, Yang P. Detecting temporal protein complexes from dynamic protein-protein interaction networks. BMC Bioinformatics 2014; 15:335. [PMID: 25282536 PMCID: PMC4288635 DOI: 10.1186/1471-2105-15-335] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2014] [Accepted: 09/23/2014] [Indexed: 12/13/2022] Open
Abstract
Background Proteins dynamically interact with each other to perform their biological functions. The dynamic operations of protein interaction networks (PPI) are also reflected in the dynamic formations of protein complexes. Existing protein complex detection algorithms usually overlook the inherent temporal nature of protein interactions within PPI networks. Systematically analyzing the temporal protein complexes can not only improve the accuracy of protein complex detection, but also strengthen our biological knowledge on the dynamic protein assembly processes for cellular organization. Results In this study, we propose a novel computational method to predict temporal protein complexes. Particularly, we first construct a series of dynamic PPI networks by joint analysis of time-course gene expression data and protein interaction data. Then a Time Smooth Overlapping Complex Detection model (TS-OCD) has been proposed to detect temporal protein complexes from these dynamic PPI networks. TS-OCD can naturally capture the smoothness of networks between consecutive time points and detect overlapping protein complexes at each time point. Finally, a nonnegative matrix factorization based algorithm is introduced to merge those very similar temporal complexes across different time points. Conclusions Extensive experimental results demonstrate the proposed method is very effective in detecting temporal protein complexes than the state-of-the-art complex detection techniques. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-335) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Dao-Qing Dai
- Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Guangzhou 510275, China.
| | | | | | | | | |
Collapse
|
25
|
Shaham G, Tuller T. Most associations between transcript features and gene expression are monotonic. MOLECULAR BIOSYSTEMS 2014; 10:1426-40. [PMID: 24675795 DOI: 10.1039/c3mb70617f] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Dozens of previous studies in the field have dealt with the relations between transcript features and their expression. Indeed, understanding the way gene expression is encoded in transcripts should not only contribute to disciplines, such as functional genomics and molecular evolution, but also to biotechnology and human health. Previous studies in the field mainly aimed at predicting protein levels of genes based on their transcript features. Most of the models employed in this context assume that the effect of each transcript feature on gene expression is monotonic. In the current study we aim to understand, for the first time, if indeed the relations between transcript features (i.e., the UTRs and ORF) and measurements related to the different stages of gene expression is monotonic. To this end, we analyze 5432 transcript features and perform gene expression measurements (mRNA levels, ribosomal densities, protein levels, etc.) of 4367 S. cerevisiae genes. We use the Maximal Information Coefficient (MIC) in order to identify potential relations that are not necessarily linear or monotonic. Our analyses demonstrate that the relation between most transcript features and the examined gene expression measurements is monotonic (only up to 1-5% of the variables, with significance levels of 0.001, are non-monotonic); in addition, in the cases of deviation from monotonicity the relation/deviation is very weak. These results should help in guiding the development of computational gene expression modeling and engineering, and improve the understanding of this process. Furthermore, the relatively simple relations between a transcript's nucleotide composition and its expression should contribute towards better understanding of transcript evolution at the molecular level.
Collapse
Affiliation(s)
- Gilad Shaham
- Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv University, Israel.
| | | |
Collapse
|
26
|
Tang D, Wang M, Zheng W, Wang H. RapidMic: Rapid Computation of the Maximal Information Coefficient. Evol Bioinform Online 2014; 10:11-6. [PMID: 24526831 PMCID: PMC3921152 DOI: 10.4137/ebo.s13121] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2013] [Revised: 11/14/2013] [Accepted: 11/17/2013] [Indexed: 11/05/2022] Open
Abstract
To discover relationships and associations rapidly in large-scale datasets, we propose a cross-platform tool for the rapid computation of the maximal information coefficient based on parallel computing methods. Through parallel processing, the provided tool can effectively analyze large-scale biological datasets with a markedly reduced computing time. The experimental results show that the proposed tool is notably fast, and is able to perform an all-pairs analysis of a large biological dataset using a normal computer. The source code and guidelines can be downloaded from https://github.com/HelloWorldCN/RapidMic.
Collapse
Affiliation(s)
- Dongming Tang
- Institute of Information Research, Southwest Jiaotong University, Chengdu, China
| | - Mingwen Wang
- School of Mathematics, Southwest Jiaotong University, Chengdu, China
| | - Weifan Zheng
- Institute of Information Research, Southwest Jiaotong University, Chengdu, China
| | - Hongjun Wang
- School of Information Science and Technology, Southwest Jiaotong University, Chengdu, China
| |
Collapse
|
27
|
Simple topological features reflect dynamics and modularity in protein interaction networks. PLoS Comput Biol 2013; 9:e1003243. [PMID: 24130468 PMCID: PMC3794914 DOI: 10.1371/journal.pcbi.1003243] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Accepted: 08/14/2013] [Indexed: 11/30/2022] Open
Abstract
The availability of large-scale protein-protein interaction networks for numerous organisms provides an opportunity to comprehensively analyze whether simple properties of proteins are predictive of the roles they play in the functional organization of the cell. We begin by re-examining an influential but controversial characterization of the dynamic modularity of the S. cerevisiae interactome that incorporated gene expression data into network analysis. We analyse the protein-protein interaction networks of five organisms, S. cerevisiae, H. sapiens, D. melanogaster, A. thaliana, and E. coli, and confirm significant and consistent functional and structural differences between hub proteins that are co-expressed with their interacting partners and those that are not, and support the view that the former tend to be intramodular whereas the latter tend to be intermodular. However, we also demonstrate that in each of these organisms, simple topological measures are significantly correlated with the average co-expression of a hub with its partners, independent of any classification, and therefore also reflect protein intra- and inter- modularity. Further, cross-interactomic analysis demonstrates that these simple topological characteristics of hub proteins tend to be conserved across organisms. Overall, we give evidence that purely topological features of static interaction networks reflect aspects of the dynamics and modularity of interactomes as well as previous measures incorporating expression data, and are a powerful means for understanding the dynamic roles of hubs in interactomes. A better understanding of protein interaction networks would be a great aid in furthering our knowledge of the molecular biology of the cell. Towards this end, large-scale protein-protein physical interaction data have been determined for organisms across the evolutionary spectrum. However, the resulting networks give a static view of interactomes, and our knowledge about protein interactions is rarely time or context specific. A previous prominent but controversial attempt to characterize the dynamic modularity of the interactome was based on integrating physical interaction data with gene activity measurements from transcript expression data. This analysis distinguished between proteins that are co-expressed with their interacting partners and those that are not, and argued that the former are intramodular and the latter are intermodular. By analyzing the interactomes of five organisms, we largely confirm the biological significance of this characterization through a variety of statistical tests and computational experiments. Surprisingly, however, we find that similar results can be obtained using just network information without additionally integrating expression data, suggesting that purely topological characteristics of interaction networks strongly reflect certain aspects of the dynamics and modularity of interactomes.
Collapse
|
28
|
Yates CM, Sternberg MJE. The effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on protein-protein interactions. J Mol Biol 2013; 425:3949-63. [PMID: 23867278 DOI: 10.1016/j.jmb.2013.07.012] [Citation(s) in RCA: 170] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2013] [Revised: 07/02/2013] [Accepted: 07/09/2013] [Indexed: 12/23/2022]
Abstract
Non-synonymous single nucleotide polymorphisms (nsSNPs) are single base changes leading to a change to the amino acid sequence of the encoded protein. Many of these variants are associated with disease, so nsSNPs have been well studied, with studies looking at the effects of nsSNPs on individual proteins, for example, on stability and enzyme active sites. In recent years, the impact of nsSNPs upon protein-protein interactions has also been investigated, giving a greater insight into the mechanisms by which nsSNPs can lead to disease. In this review, we summarize these studies, looking at the various mechanisms by which nsSNPs can affect protein-protein interactions. We focus on structural changes that can impair interaction, changes to disorder, gain of interaction, and post-translational modifications before looking at some examples of nsSNPs at human-pathogen protein-protein interfaces and the analysis of nsSNPs from a network perspective.
Collapse
Affiliation(s)
- Christopher M Yates
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Sir Ernst Chain Building, Imperial College London, South Kensington, SW7 2AZ, UK.
| | | |
Collapse
|
29
|
Gao S, Wang X. Identification of highly synchronized subnetworks from gene expression data. BMC Bioinformatics 2013; 14 Suppl 9:S5. [PMID: 23901792 PMCID: PMC3698028 DOI: 10.1186/1471-2105-14-s9-s5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Background There has been a growing interest in identifying context-specific active protein-protein interaction (PPI) subnetworks through integration of PPI and time course gene expression data. However the interaction dynamics during the biological process under study has not been sufficiently considered previously. Methods Here we propose a topology-phase locking (TopoPL) based scoring metric for identifying active PPI subnetworks from time series expression data. First the temporal coordination in gene expression changes is evaluated through phase locking analysis; The results are subsequently integrated with PPI to define an activity score for each PPI subnetwork, based on individual member expression, as well topological characteristics of the PPI network and of the expression temporal coordination network; Lastly, the subnetworks with the top scores in the whole PPI network are identified through simulated annealing search. Results Application of TopoPL to simulated data and to the yeast cell cycle data showed that it can more sensitively identify biologically meaningful subnetworks than the method that only utilizes the static PPI topology, or the additive scoring method. Using TopoPL we identified a core subnetwork with 49 genes important to yeast cell cycle. Interestingly, this core contains a protein complex known to be related to arrangement of ribosome subunits that exhibit extremely high gene expression synchronization. Conclusions Inclusion of interaction dynamics is important to the identification of relevant gene networks.
Collapse
Affiliation(s)
- Shouguo Gao
- Department of Physics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | | |
Collapse
|
30
|
Das J, Vo TV, Wei X, Mellor JC, Tong V, Degatano AG, Wang X, Wang L, Cordero NA, Kruer-Zerhusen N, Matsuyama A, Pleiss JA, Lipkin SM, Yoshida M, Roth FP, Yu H. Cross-species protein interactome mapping reveals species-specific wiring of stress response pathways. Sci Signal 2013; 6:ra38. [PMID: 23695164 DOI: 10.1126/scisignal.2003350] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The fission yeast Schizosaccharomyces pombe has more metazoan-like features than the budding yeast Saccharomyces cerevisiae, yet it has similarly facile genetics. We present a large-scale verified binary protein-protein interactome network, "StressNet," based on high-throughput yeast two-hybrid screens of interacting proteins classified as part of stress response and signal transduction pathways in S. pombe. We performed systematic, cross-species interactome mapping using StressNet and a protein interactome network of orthologous proteins in S. cerevisiae. With cross-species comparative network studies, we detected a previously unidentified component (Snr1) of the S. pombe mitogen-activated protein kinase Sty1 pathway. Coimmunoprecipitation experiments showed that Snr1 interacted with Sty1 and that deletion of snr1 increased the sensitivity of S. pombe cells to stress. Comparison of StressNet with the interactome network of orthologous proteins in S. cerevisiae showed that most of the interactions among these stress response and signaling proteins are not conserved between species but are "rewired"; orthologous proteins have different binding partners in both species. In particular, transient interactions connecting proteins in different functional modules were more likely to be rewired than conserved. By directly testing interactions between proteins in one yeast species and their corresponding binding partners in the other yeast species with yeast two-hybrid assays, we found that about half of the interactions that are traditionally considered "conserved" form modified interaction interfaces that may potentially accommodate novel functions.
Collapse
Affiliation(s)
- Jishnu Das
- Department of Biological Statistics and Computational Biology Cornell University, Ithaca, NY 14853, USA.,Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| | - Tommy V Vo
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA.,Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Xiaomu Wei
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA.,Department of Medicine, Weill Cornell College of Medicine, New York, NY 10021, USA
| | - Joseph C Mellor
- Donnelly Centre, University of Toronto, Toronto, ON M5S-3E1, Canada
| | - Virginia Tong
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| | - Andrew G Degatano
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| | - Xiujuan Wang
- Department of Biological Statistics and Computational Biology Cornell University, Ithaca, NY 14853, USA.,Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| | - Lihua Wang
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| | - Nicolas A Cordero
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| | - Nathan Kruer-Zerhusen
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA.,Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Akihisa Matsuyama
- Chemical Genetics Laboratory, RIKEN Advanced Science Institute, Wako, Saitama 351-0198, Japan.,CREST Research Project, JST, Kawaguchi, Saitama 332-0012, Japan
| | - Jeffrey A Pleiss
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Steven M Lipkin
- Department of Medicine, Weill Cornell College of Medicine, New York, NY 10021, USA
| | - Minoru Yoshida
- Chemical Genetics Laboratory, RIKEN Advanced Science Institute, Wako, Saitama 351-0198, Japan.,CREST Research Project, JST, Kawaguchi, Saitama 332-0012, Japan.,Department of Biotechnology, Graduate School of Agriculture and Life Sciences, University of Tokyo, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON M5S-3E1, Canada.,Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON M5S-3E1, Canada.,Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA 02115.,Harvard Medical School, Boston, MA 02115.,Samuel Lunenfeld Research Institute, Mt. Sinai Hospital, Toronto, ON M5G-1X5, Canada.,Genetic Networks Program, Canadian Institute for Advanced Research, Toronto, ON M5G-1Z8, Canada
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology Cornell University, Ithaca, NY 14853, USA.,Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
31
|
Affiliation(s)
- Aiden Corvin
- Department of Psychiatry & Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine, Trinity College Dublin, Dublin 2, Ireland.
| |
Collapse
|
32
|
Albanese D, Filosi M, Visintainer R, Riccadonna S, Jurman G, Furlanello C. Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers. ACTA ACUST UNITED AC 2012; 29:407-8. [PMID: 23242262 DOI: 10.1093/bioinformatics/bts707] [Citation(s) in RCA: 129] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
UNLABELLED We introduce a novel implementation in ANSI C of the MINE family of algorithms for computing maximal information-based measures of dependence between two variables in large datasets, with the aim of a low memory footprint and ease of integration within bioinformatics pipelines. We provide the libraries minerva (with the R interface) and minepy for Python, MATLAB, Octave and C++. The C solution reduces the large memory requirement of the original Java implementation, has good upscaling properties and offers a native parallelization for the R interface. Low memory requirements are demonstrated on the MINE benchmarks as well as on large ( = 1340) microarray and Illumina GAII RNA-seq transcriptomics datasets. AVAILABILITY AND IMPLEMENTATION Source code and binaries are freely available for download under GPL3 licence at http://minepy.sourceforge.net for minepy and through the CRAN repository http://cran.r-project.org for the R package minerva. All software is multiplatform (MS Windows, Linux and OSX).
Collapse
Affiliation(s)
- Davide Albanese
- Fondazione Bruno Kessler, via Sommarive 18, I-38123 Povo (Trento), Italy
| | | | | | | | | | | |
Collapse
|
33
|
van Wijk SJL, Melquiond ASJ, de Vries SJ, Timmers HTM, Bonvin AMJJ. Dynamic control of selectivity in the ubiquitination pathway revealed by an ASP to GLU substitution in an intra-molecular salt-bridge network. PLoS Comput Biol 2012; 8:e1002754. [PMID: 23133359 PMCID: PMC3486841 DOI: 10.1371/journal.pcbi.1002754] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2012] [Accepted: 09/09/2012] [Indexed: 01/01/2023] Open
Abstract
Ubiquitination relies on a subtle balance between selectivity and promiscuity achieved through specific interactions between ubiquitin-conjugating enzymes (E2s) and ubiquitin ligases (E3s). Here, we report how a single aspartic to glutamic acid substitution acts as a dynamic switch to tip the selectivity balance of human E2s for interaction toward E3 RING-finger domains. By combining molecular dynamic simulations, experimental yeast-two-hybrid screen of E2-E3 (RING) interactions and mutagenesis, we reveal how the dynamics of an internal salt-bridge network at the rim of the E2-E3 interaction surface controls the balance between an "open", binding competent, and a "closed", binding incompetent state. The molecular dynamic simulations shed light on the fine mechanism of this molecular switch and allowed us to identify its components, namely an aspartate/glutamate pair, a lysine acting as the central switch and a remote aspartate. Perturbations of single residues in this network, both inside and outside the interaction surface, are sufficient to switch the global E2 interaction selectivity as demonstrated experimentally. Taken together, our results indicate a new mechanism to control E2-E3 interaction selectivity at an atomic level, highlighting how minimal changes in amino acid side-chain affecting the dynamics of intramolecular salt-bridges can be crucial for protein-protein interactions. These findings indicate that the widely accepted sequence-structure-function paradigm should be extended to sequence-structure-dynamics-function relationship and open new possibilities for control and fine-tuning of protein interaction selectivity.
Collapse
Affiliation(s)
- Sjoerd J. L. van Wijk
- Department of Molecular Cancer Research, Division of Biomedical Genetics and Netherlands Proteomics Center, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Adrien S. J. Melquiond
- Computational Structural Biology Group, Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Sjoerd J. de Vries
- Computational Structural Biology Group, Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - H. Th. Marc Timmers
- Department of Molecular Cancer Research, Division of Biomedical Genetics and Netherlands Proteomics Center, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Alexandre M. J. J. Bonvin
- Computational Structural Biology Group, Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University, Utrecht, The Netherlands
- * E-mail:
| |
Collapse
|
34
|
Das J, Yu H. HINT: High-quality protein interactomes and their applications in understanding human disease. BMC SYSTEMS BIOLOGY 2012; 6:92. [PMID: 22846459 PMCID: PMC3483187 DOI: 10.1186/1752-0509-6-92] [Citation(s) in RCA: 308] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Accepted: 06/30/2012] [Indexed: 12/22/2022]
Abstract
Background A global map of protein-protein interactions in cellular systems provides key insights into the workings of an organism. A repository of well-validated high-quality protein-protein interactions can be used in both large- and small-scale studies to generate and validate a wide range of functional hypotheses. Results We develop HINT (http://hint.yulab.org) - a database of high-quality protein-protein interactomes for human, Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Oryza sativa. These were collected from several databases and filtered both systematically and manually to remove low-quality/erroneous interactions. The resulting datasets are classified by type (binary physical interactions vs. co-complex associations) and data source (high-throughput systematic setups vs. literature-curated small-scale experiments). We find strong sociological sampling biases in literature-curated datasets of small-scale interactions. An interactome without such sampling biases was used to understand network properties of human disease-genes - hubs are unlikely to cause disease, but if they do, they usually cause multiple disorders. Conclusions HINT is of significant interest to researchers in all fields of biology as it addresses the ubiquitous need of having a repository of high-quality protein-protein interactions. These datasets can be utilized to generate specific hypotheses about specific proteins and/or pathways, as well as analyzing global properties of cellular networks. HINT will be regularly updated and all versions will be tracked.
Collapse
Affiliation(s)
- Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.
| | | |
Collapse
|