1
|
Cai G, Sun M, Li X, Zhu J. Construction and characterization of rectal cancer-related lncRNA-mRNA ceRNA network reveals prognostic biomarkers in rectal cancer. IET Syst Biol 2021; 15:192-204. [PMID: 34613665 PMCID: PMC8675822 DOI: 10.1049/syb2.12035] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 08/22/2021] [Accepted: 09/23/2021] [Indexed: 12/26/2022] Open
Abstract
Rectal cancer is an important cause of cancer‐related deaths worldwide. In this study, the differentially expressed (DE) lncRNAs/mRNAs were first identified and the correlation level between DE lncRNAs and mRNAs were calculated. The results showed that genes of highly correlated lncRNA‐mRNA pairs presented strong prognosis effects, such as GPM6A, METTL24, SCN7A, HAND2‐AS1 and PDZRN4. Then, the rectal cancer‐related lncRNA‐mRNA network was constructed based on the ceRNA theory. Topological analysis of the network revealed that the network was maintained by hub nodes and a hub subnetwork was constructed, including the hub lncRNA MIR143HG and MBNL1‐SA1. Further analysis indicated that the hub subnetwork was highly related to cancer pathways, such as ‘Focal adhesion’ and ‘Wnt signalling pathway’. Hub subnetwork also had significant prognosis capability. A closed lncRNA‐mRNA module was identified by bilateral network clustering. Genes in modules also showed high prognosis effects. Finally, a core lncRNA‐TF crosstalk network was identified to uncover the crosstalk and regulatory mechanisms of lncRNAs and TFs by integrating ceRNA crosstalks and TF binding affinities. Some core genes, such as MEIS1, GLI3 and HAND2‐AS1 were considered as the key regulators in tumourigenesis. Based on the authors’ comprehensive analysis, all these lncRNA‐mRNA crosstalks provided promising clues for biological prognosis of rectal cancer.
Collapse
Affiliation(s)
- Guoying Cai
- Department of Integrative Medicine & Medical Oncology, Shengzhou People's Hospital (the First Affiliated Hospital of Zhejiang University, Shengzhou Branch), Shengzhou, Zhejiang, China
| | - Meifei Sun
- Department of Integrative Medicine & Medical Oncology, Shengzhou People's Hospital (the First Affiliated Hospital of Zhejiang University, Shengzhou Branch), Shengzhou, Zhejiang, China
| | - Xinrong Li
- Department of Integrative Medicine & Medical Oncology, Shengzhou People's Hospital (the First Affiliated Hospital of Zhejiang University, Shengzhou Branch), Shengzhou, Zhejiang, China
| | - Junquan Zhu
- Department of Integrative Medicine & Medical Oncology, Shengzhou People's Hospital (the First Affiliated Hospital of Zhejiang University, Shengzhou Branch), Shengzhou, Zhejiang, China
| |
Collapse
|
2
|
Zhu L, Deng SP, You ZH, Huang DS. Identifying Spurious Interactions in the Protein-Protein Interaction Networks Using Local Similarity Preserving Embedding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:345-352. [PMID: 28368812 DOI: 10.1109/tcbb.2015.2407393] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In recent years, a remarkable amount of protein-protein interaction (PPI) data are being available owing to the advance made in experimental high-throughput technologies. However, the experimentally detected PPI data usually contain a large amount of spurious links, which could contaminate the analysis of the biological significance of protein links and lead to incorrect biological discoveries, thereby posing new challenges to both computational and biological scientists. In this paper, we develop a new embedding algorithm called local similarity preserving embedding (LSPE) to rank the interaction possibility of protein links. By going beyond limitations of current geometric embedding methods for network denoising and emphasizing the local information of PPI networks, LSPE can avoid the unstableness of previous methods. We demonstrate experimental results on benchmark PPI networks and show that LSPE was the overall leader, outperforming the state-of-the-art methods in topological false links elimination problems.
Collapse
|
3
|
Abstract
Molecular profiling of proteins and phosphoproteins using a reverse phase protein array (RPPA) platform, with a panel of target-specific antibodies, enables the parallel, quantitative proteomic analysis of many biological samples in a microarray format. Hence, RPPA analysis can generate a high volume of multidimensional data that must be effectively interrogated and interpreted. A range of computational techniques for data mining can be applied to detect and explore data structure and to form functional predictions from large datasets. Here, two approaches for the computational analysis of RPPA data are detailed: the identification of similar patterns of protein expression by hierarchical cluster analysis and the modeling of protein interactions and signaling relationships by network analysis. The protocols use freely available, cross-platform software, are easy to implement, and do not require any programming expertise. Serving as data-driven starting points for further in-depth analysis, validation, and biological experimentation, these and related bioinformatic approaches can accelerate the functional interpretation of RPPA data.
Collapse
Affiliation(s)
- Adam Byron
- Cancer Research UK Edinburgh Centre, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XR, UK.
| |
Collapse
|
4
|
Lei C, Tamim S, Bishop AJ, Ruan J. Fully automated protein complex prediction based on topological similarity and community structure. Proteome Sci 2013; 11:S9. [PMID: 24564887 PMCID: PMC3908383 DOI: 10.1186/1477-5956-11-s1-s9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
To understand the function of protein complexes and their association with biological processes, a lot of studies have been done towards analyzing the protein-protein interaction (PPI) networks. However, the advancement in high-throughput technology has resulted in a humongous amount of data for analysis. Moreover, high level of noise, sparseness, and skewness in degree distribution of PPI networks limits the performance of many clustering algorithms and further analysis of their interactions. In addressing and solving these problems we present a novel random walk based algorithm that converts the incomplete and binary PPI network into a protein-protein topological similarity matrix (PP-TS matrix). We believe that if two proteins share some high-order topological similarities they are likely to be interacting with each other. Using the obtained PP-TS matrix, we constructed and used weighted networks to further study and analyze the interaction among proteins. Specifically, we applied a fully automated community structure finding algorithm (Auto-HQcut) on the obtained weighted network to cluster protein complexes. We then analyzed the protein complexes for significance in biological processes. To help visualize and analyze these protein complexes we also developed an interface that displays the resulting complexes as well as the characteristics associated with each complex. Applying our approach to a yeast protein-protein interaction network, we found that the predicted protein-protein interaction pairs with high topological similarities have more significant biological relevance than the original protein-protein interactions pairs. When we compared our PPI network reconstruction algorithm with other existing algorithms using gene ontology and gene co-expression, our algorithm produced the highest similarity scores. Also, our predicted protein complexes showed higher accuracy measure compared to the other protein complex predictions.
Collapse
|
5
|
Optimization criteria and biological process enrichment in homologous multiprotein modules. Proc Natl Acad Sci U S A 2013; 110:10872-7. [PMID: 23757502 DOI: 10.1073/pnas.1308621110] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Biological process enrichment is a widely used metric for evaluating the quality of multiprotein modules. In this study, we examine possible optimization criteria for detecting homologous multiprotein modules and quantify their effects on biological process enrichment. We find that modularity, linear density, and module size are the most important criteria considered, complementary to each other, and that graph theoretical attributes account for 36% of the variance in biological process enrichment. Variations in protein interaction similarity within module pairs have only minor effects on biological process enrichment. As random modules increase in size, both biological process enrichment and modularity tend to improve, although modularity does not show this upward trend in modules with size at most 50 proteins. To adjust for these trends, we recommend a size correction based on random sampling of modules when using biological process enrichment or other attributes to evaluate module boundaries. Characteristics of homologous multiprotein modules optimized for each of the optimization criteria are examined.
Collapse
|
6
|
Abstract
Complex diseases are caused by a combination of genetic and environmental factors. Uncovering the molecular pathways through which genetic factors affect a phenotype is always difficult, but in the case of complex diseases this is further complicated since genetic factors in affected individuals might be different. In recent years, systems biology approaches and, more specifically, network based approaches emerged as powerful tools for studying complex diseases. These approaches are often built on the knowledge of physical or functional interactions between molecules which are usually represented as an interaction network. An interaction network not only reports the binary relationships between individual nodes but also encodes hidden higher level organization of cellular communication. Computational biologists were challenged with the task of uncovering this organization and utilizing it for the understanding of disease complexity, which prompted rich and diverse algorithmic approaches to be proposed. We start this chapter with a description of the general characteristics of complex diseases followed by a brief introduction to physical and functional networks. Next we will show how these networks are used to leverage genotype, gene expression, and other types of data to identify dysregulated pathways, infer the relationships between genotype and phenotype, and explain disease heterogeneity. We group the methods by common underlying principles and first provide a high level description of the principles followed by more specific examples. We hope that this chapter will give readers an appreciation for the wealth of algorithmic techniques that have been developed for the purpose of studying complex diseases as well as insight into their strengths and limitations.
Collapse
Affiliation(s)
- Dong-Yeon Cho
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
| | - Yoo-Ah Kim
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
| | - Teresa M. Przytycka
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
7
|
Lei C, Ruan J. A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity. ACTA ACUST UNITED AC 2012; 29:355-64. [PMID: 23235927 DOI: 10.1093/bioinformatics/bts688] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
MOTIVATION Recent advances in technology have dramatically increased the availability of protein-protein interaction (PPI) data and stimulated the development of many methods for improving the systems level understanding the cell. However, those efforts have been significantly hindered by the high level of noise, sparseness and highly skewed degree distribution of PPI networks. Here, we present a novel algorithm to reduce the noise present in PPI networks. The key idea of our algorithm is that two proteins sharing some higher-order topological similarities, measured by a novel random walk-based procedure, are likely interacting with each other and may belong to the same protein complex. RESULTS Applying our algorithm to a yeast PPI network, we found that the edges in the reconstructed network have higher biological relevance than in the original network, assessed by multiple types of information, including gene ontology, gene expression, essentiality, conservation between species and known protein complexes. Comparison with existing methods shows that the network reconstructed by our method has the highest quality. Using two independent graph clustering algorithms, we found that the reconstructed network has resulted in significantly improved prediction accuracy of protein complexes. Furthermore, our method is applicable to PPI networks obtained with different experimental systems, such as affinity purification, yeast two-hybrid (Y2H) and protein-fragment complementation assay (PCA), and evidence shows that the predicted edges are likely bona fide physical interactions. Finally, an application to a human PPI network increased the coverage of the network by at least 100%. AVAILABILITY www.cs.utsa.edu/∼jruan/RWS/.
Collapse
Affiliation(s)
- Chengwei Lei
- Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX 78249, USA
| | | |
Collapse
|
8
|
MOfinder: a novel algorithm for detecting overlapping modules from protein-protein interaction network. J Biomed Biotechnol 2012; 2012:103702. [PMID: 22500072 PMCID: PMC3303734 DOI: 10.1155/2012/103702] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Revised: 10/19/2011] [Accepted: 10/21/2011] [Indexed: 11/17/2022] Open
Abstract
Since organism development and many critical cell biology processes are organized in modular patterns, many algorithms have been proposed to detect modules. In this study, a new method, MOfinder, was developed to detect overlapping modules in a protein-protein interaction (PPI) network. We demonstrate that our method is more accurate than other 5 methods. Then, we applied MOfinder to yeast and human PPI network and explored the overlapping information. Using the overlapping modules of human PPI network, we constructed the module-module communication network. Functional annotation showed that the immune-related and cancer-related proteins were always together and present in the same modules, which offer some clues for immune therapy for cancer. Our study around overlapping modules suggests a new perspective on the analysis of PPI network and improves our understanding of disease.
Collapse
|
9
|
Cui G, Shrestha R, Han K. ModuleSearch: finding functional modules in a protein-protein interaction network. Comput Methods Biomech Biomed Engin 2011; 15:691-9. [PMID: 21827286 DOI: 10.1080/10255842.2011.555404] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
Many biological processes are performed by a group of proteins rather than by individual proteins. Proteins involved in the same biological process often form a densely connected sub-graph in a protein-protein interaction network. Therefore, finding a dense sub-graph provides useful information to predict the function or protein complex of uncharacterised proteins in the sub-graph. We developed a heuristic algorithm that finds functional modules in a protein-protein interaction network and visualises the modules. The algorithm has been implemented in a platform-independent, standalone program called ModuleSearch. In an interaction network of yeast proteins, ModuleSearch found 366 overlapping modules. Of the modules, 71% have a function shared by more than half the proteins in the module and 58% have a function shared by all proteins in the module. Comparison of ModuleSearch with other programs shows that ModuleSearch finds more sub-graphs than most other programs, yet a higher proportion of the sub-graphs correspond to known functional modules. ModuleSearch and sample data are freely available to academics at http://bclab.inha.ac.kr/ModuleSearch.
Collapse
Affiliation(s)
- Guangyu Cui
- School of Computer Science and Engineering, Inha University, Incheon, 402-751, South Korea
| | | | | |
Collapse
|
10
|
Feng J, Jiang R, Jiang T. A max-flow-based approach to the identification of protein complexes using protein interaction and microarray data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:621-634. [PMID: 20733237 DOI: 10.1109/tcbb.2010.78] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. By combining these two types of data, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network, and then, breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log-fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our tests on three widely used protein-protein interaction data sets and comparisons with several latest methods for protein complex identification demonstrate the strong performance of our method in predicting novel protein complexes in terms of its specificity and efficiency. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.
Collapse
Affiliation(s)
- Jianxing Feng
- Department of Computer Science and Technology, Tsinghua University, 1207B Zijing Building 15#, Beijing 100084, China.
| | | | | |
Collapse
|
11
|
Wang J, Li M, Chen J, Pan Y. A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:607-620. [PMID: 20733244 DOI: 10.1109/tcbb.2010.75] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
As advances in the technologies of predicting protein interactions, huge data sets portrayed as networks have been available. Identification of functional modules from such networks is crucial for understanding principles of cellular organization and functions. However, protein interaction data produced by high-throughput experiments are generally associated with high false positives, which makes it difficult to identify functional modules accurately. In this paper, we propose a fast hierarchical clustering algorithm HC-PIN based on the local metric of edge clustering value which can be used both in the unweighted network and in the weighted network. The proposed algorithm HC-PIN is applied to the yeast protein interaction network, and the identified modules are validated by all the three types of Gene Ontology (GO) Terms: Biological Process, Molecular Function, and Cellular Component. The experimental results show that HC-PIN is not only robust to false positives, but also can discover the functional modules with low density. The identified modules are statistically significant in terms of three types of GO annotations. Moreover, HC-PIN can uncover the hierarchical organization of functional modules with the variation of its parameter's value, which is approximatively corresponding to the hierarchical structure of GO annotations. Compared to other previous competing algorithms, our algorithm HC-PIN is faster and more accurate.
Collapse
Affiliation(s)
- Jianxin Wang
- Department of Computer Science, School of Information Science and Engineering, Central South University, Changsha 410083, China.
| | | | | | | |
Collapse
|
12
|
Abstract
The increasing availability of large-scale protein-protein interaction data has made it possible to understand the basic components and organization of cell machinery from the network level. The arising challenge is how to analyze such complex interacting data to reveal the principles of cellular organization, processes and functions. Many studies have shown that clustering protein interaction network is an effective approach for identifying protein complexes or functional modules, which has become a major research topic in systems biology. In this review, recent advances in clustering methods for protein interaction networks will be presented in detail. The predictions of protein functions and interactions based on modules will be covered. Finally, the performance of different clustering methods will be compared and the directions for future research will be discussed.
Collapse
Affiliation(s)
- Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha 410083, China
- Department of Computer Science, Georgia State University, Atlanta, GA30303, USA
| | - Min Li
- School of Information Science and Engineering, Central South University, Changsha 410083, China
| | - Youping Deng
- Rush University Cancer Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA30303, USA
| |
Collapse
|
13
|
Song J, Singh M. How and when should interactome-derived clusters be used to predict functional modules and protein function? ACTA ACUST UNITED AC 2009; 25:3143-50. [PMID: 19770263 PMCID: PMC3167697 DOI: 10.1093/bioinformatics/btp551] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Motivation: Clustering of protein–protein interaction networks is one of the most common approaches for predicting functional modules, protein complexes and protein functions. But, how well does clustering perform at these tasks? Results: We develop a general framework to assess how well computationally derived clusters in physical interactomes overlap functional modules derived via the Gene Ontology (GO). Using this framework, we evaluate six diverse network clustering algorithms using Saccharomyces cerevisiae and show that (i) the performances of these algorithms can differ substantially when run on the same network and (ii) their relative performances change depending upon the topological characteristics of the network under consideration. For the specific task of function prediction in S.cerevisiae, we demonstrate that, surprisingly, a simple non-clustering guilt-by-association approach outperforms widely used clustering-based approaches that annotate a protein with the overrepresented biological process and cellular component terms in its cluster; this is true over the range of clustering algorithms considered. Further analysis parameterizes performance based on the number of annotated proteins, and suggests when clustering approaches should be used for interactome functional analyses. Overall our results suggest a re-examination of when and how clustering approaches should be applied to physical interactomes, and establishes guidelines by which novel clustering approaches for biological networks should be justified and evaluated with respect to functional analysis. Contact:msingh@cs.princeton.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jimin Song
- Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics Princeton University, Princeton, NJ 08544, USA
| | | |
Collapse
|
14
|
Navlakha S, Schatz MC, Kingsford C. Revealing biological modules via graph summarization. J Comput Biol 2009; 16:253-64. [PMID: 19183002 DOI: 10.1089/cmb.2008.11tt] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
The division of a protein interaction network into biologically meaningful modules can aid with automated detection of protein complexes and prediction of biological processes and can uncover the global organization of the cell. We propose the use of a graph summarization (GS) technique, based on graph compression, to cluster protein interaction graphs into biologically relevant modules. The method is motivated by defining a biological module as a set of proteins that have similar sets of interaction partners. We show this definition, put into practice by a GS algorithm, reveals modules that are more biologically enriched than those found by other methods. We also apply GS to predict complex memberships, biological processes, and co-complexed pairs and show that in most settings GS is preferable over existing methods of protein interaction graph clustering.
Collapse
Affiliation(s)
- Saket Navlakha
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| | | | | |
Collapse
|