1
|
Zhang GJ, Wang XQ, Ma LF, Wang LJ, Hu J, Zhou XG. Two-Stage Distance Feature-based Optimization Algorithm for De novo Protein Structure Prediction. IEEE/ACM Trans Comput Biol Bioinform 2020; 17:2119-2130. [PMID: 31107659 DOI: 10.1109/tcbb.2019.2917452] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
De novo protein structure prediction can be treated as a conformational space optimization problem under the guidance of an energy function. However, it is a challenge of how to design an accurate energy function which ensures low-energy conformations close to native structures. Fortunately, recent studies have shown that the accuracy of de novo protein structure prediction can be significantly improved by integrating the residue-residue distance information. In this paper, a two-stage distance feature-based optimization algorithm (TDFO) for de novo protein structure prediction is proposed within the framework of evolutionary algorithm. In TDFO, a similarity model is first designed by using feature information which is extracted from distance profiles by bisecting K-means algorithm. The similarity model-based selection strategy is then developed to guide conformation search, and thus improve the quality of the predicted models. Moreover, global and local mutation strategies are designed, and a state estimation strategy is also proposed to strike a trade-off between the exploration and exploitation of the search space. Experimental results of 35 benchmark proteins show that the proposed TDFO can improve prediction accuracy for a large portion of test proteins.
Collapse
|
2
|
Abstract
Background: Biological entities such as genes, promoters, mRNA, metabolites or proteins do not act alone, but in concert in their network context. Modules, i.e., groups of nodes with similar topological properties in these networks characterize important biological functions of the underlying biomolecular system. Edges in such molecular networks represent regulatory and physical interactions, and comparing them between conditions provides valuable information on differential molecular mechanisms. However, biological data is inherently noisy and network reduction techniques can propagate errors particularly to the level of edges. We aim to improve the analysis of networks of biological molecules by deriving modules together with edge relevance estimations that are based on global network characteristics. Methods: The key challenge we address here is investigating the capability of stochastic block models (SBMs) for representing and analyzing different types of biomolecular networks. Fitting them to SBMs both delivers modules of the networks and enables the derivation of edge confidence scores, and it has not yet been investigated for analyzing biomolecular networks. We apply SBM-based analysis independently to three correlation-based networks of breast cancer data originating from high-throughput measurements of different molecular layers: either transcriptomics, proteomics, or metabolomics. The networks were reduced by thresholding for correlation significance or by requirements on scale-freeness. Results and discussion: We find that the networks are best represented by the hierarchical version of the SBM, and many of the predicted blocks have a biologically and phenotypically relevant functional annotation. The edge confidence scores are overall in concordance with the biological evidence given by the measurements. We conclude that biomolecular networks can be appropriately represented and analyzed by fitting SBMs. As the SBM-derived edge confidence scores are based on global network connectivity characteristics and potential hierarchies within the biomolecular networks are considered, they could be used as additional, integrated features in network-based data comparisons.
Collapse
Affiliation(s)
- Katharina Baum
- Bioinformatics and Modelling, Luxembourg Institute of Health, Strassen, Luxembourg
- Mathematical Modelling of Cellular Processes, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Jagath C. Rajapakse
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Francisco Azuaje
- Bioinformatics and Modelling, Luxembourg Institute of Health, Strassen, Luxembourg
| |
Collapse
|
3
|
Abstract
Background: Biological entities such as genes, promoters, mRNA, metabolites or proteins do not act alone, but in concert in their network context. Modules, i.e., groups of nodes with similar topological properties in these networks characterize important biological functions of the underlying biomolecular system. Edges in such molecular networks represent regulatory and physical interactions, and comparing them between conditions provides valuable information on differential molecular mechanisms. However, biological data is inherently noisy and network reduction techniques can propagate errors particularly to the level of edges. We aim to improve the analysis of networks of biological molecules by deriving modules together with edge relevance estimations that are based on global network characteristics. Methods: We propose to fit the networks to stochastic block models (SBM), a method that has not yet been investigated for the analysis of biomolecular networks. This procedure both delivers modules of the networks and enables the derivation of edge confidence scores. We apply it to correlation-based networks of breast cancer data originating from high-throughput measurements of diverse molecular layers such as transcriptomics, proteomics, and metabolomics. The networks were reduced by thresholding for correlation significance or by requirements on scale-freeness. Results and discussion: We find that the networks are best represented by the hierarchical version of the SBM, and many of the predicted blocks have a biological meaning according to functional annotation. The edge confidence scores are overall in concordance with the biological evidence given by the measurements. As they are based on global network connectivity characteristics and potential hierarchies within the biomolecular networks are taken into account, they could be used as additional, integrated features in network-based data comparisons. Their tight relationship to edge existence probabilities can be exploited to predict missing or spurious edges in order to improve the network representation of the underlying biological system.
Collapse
Affiliation(s)
- Katharina Baum
- Bioinformatics and Modelling, Luxembourg Institute of Health, Strassen, Luxembourg
- Mathematical Modelling of Cellular Processes, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Jagath C. Rajapakse
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Francisco Azuaje
- Bioinformatics and Modelling, Luxembourg Institute of Health, Strassen, Luxembourg
| |
Collapse
|
4
|
Zhang GJ, Zhou XG, Yu XF, Hao XH, Yu L. Enhancing Protein Conformational Space Sampling Using Distance Profile-Guided Differential Evolution. IEEE/ACM Trans Comput Biol Bioinform 2017; 14:1288-1301. [PMID: 28113726 DOI: 10.1109/tcbb.2016.2566617] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
De novo protein structure prediction aims to search for low-energy conformations as it follows the thermodynamics hypothesis that places native conformations at the global minimum of the protein energy surface. However, the native conformation is not necessarily located in the lowest-energy regions owing to the inaccuracies of the energy model. This study presents a differential evolution algorithm using distance profile-based selection strategy to sample conformations with reasonable structure effectively. In the proposed algorithm, besides energy, the residue-residue distance is considered another measure of the conformation. The average distance errors of decoys between the distance of each residue pair and the corresponding distance in the distance profiles are first calculated when the trial conformation yields a larger energy value than that of the target. Then, the distance acceptance probability of the trial conformation is designed based on distance profiles if the trial conformation obtains a lower average distance error compared with that of the target conformation. The trial conformation is accepted to the next generation in accordance with its distance acceptance probability. By using the dual constraints of energy and distance in guiding sampling, the algorithm can sample conformations with lower energies and more reasonable structures. Experimental results of 28 benchmark proteins show that the proposed algorithm can effectively predict near-native protein structures.
Collapse
|
5
|
Zhu L, Deng SP, You ZH, Huang DS. Identifying Spurious Interactions in the Protein-Protein Interaction Networks Using Local Similarity Preserving Embedding. IEEE/ACM Trans Comput Biol Bioinform 2017; 14:345-352. [PMID: 28368812 DOI: 10.1109/tcbb.2015.2407393] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In recent years, a remarkable amount of protein-protein interaction (PPI) data are being available owing to the advance made in experimental high-throughput technologies. However, the experimentally detected PPI data usually contain a large amount of spurious links, which could contaminate the analysis of the biological significance of protein links and lead to incorrect biological discoveries, thereby posing new challenges to both computational and biological scientists. In this paper, we develop a new embedding algorithm called local similarity preserving embedding (LSPE) to rank the interaction possibility of protein links. By going beyond limitations of current geometric embedding methods for network denoising and emphasizing the local information of PPI networks, LSPE can avoid the unstableness of previous methods. We demonstrate experimental results on benchmark PPI networks and show that LSPE was the overall leader, outperforming the state-of-the-art methods in topological false links elimination problems.
Collapse
|
6
|
Huang XT, Zhu Y, Chan LLH, Zhao Z, Yan H. An integrative C. elegans protein-protein interaction network with reliability assessment based on a probabilistic graphical model. Mol Biosyst 2016; 12:85-92. [PMID: 26555698 DOI: 10.1039/c5mb00417a] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
In Caenorhabditis elegans, a large number of protein-protein interactions (PPIs) are identified by different experiments. However, a comprehensive weighted PPI network, which is essential for signaling pathway inference, is not yet available in this model organism. Therefore, we firstly construct an integrative PPI network in C. elegans with 12,951 interactions involving 5039 proteins from seven molecular interaction databases. Then, a reliability score based on a probabilistic graphical model (RSPGM) is proposed to assess PPIs. It assumes that the random number of interactions between two proteins comes from the Bernoulli distribution to avoid multi-links. The main parameter of the RSPGM score contains a few latent variables which can be considered as several common properties between two proteins. Validations on high-confidence yeast datasets show that RSPGM provides more accurate evaluation than other approaches, and the PPIs in the reconstructed PPI network have higher biological relevance than that in the original network in terms of gene ontology, gene expression, essentiality and the prediction of known protein complexes. Furthermore, this weighted integrative PPI network in C. elegans is employed on inferring interaction path of the canonical Wnt/β-catenin pathway as well. Most genes on the inferred interaction path have been validated to be Wnt pathway components. Therefore, RSPGM is essential and effective for evaluating PPIs and inferring interaction path. Finally, the PPI network with RSPGM scores can be queried and visualized on a user interactive website, which is freely available at .
Collapse
Affiliation(s)
- Xiao-Tai Huang
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| | - Yuan Zhu
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China and School of Automation, China University of Geosciences, Wuhan, China.
| | - Leanne Lai Hang Chan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| | - Zhongying Zhao
- Department of Biology, Faculty of Science, Hong Kong Baptist University, Hong Kong, China
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| |
Collapse
|
7
|
Fang Y, Sun M, Dai G, Ramain K. The Intrinsic Geometric Structure of Protein-Protein Interaction Networks for Protein Interaction Prediction. IEEE/ACM Trans Comput Biol Bioinform 2016; 13:76-85. [PMID: 26886733 DOI: 10.1109/tcbb.2015.2456876] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Recent developments in high-throughput technologies for measuring protein-protein interaction (PPI) have profoundly advanced our ability to systematically infer protein function and regulation. However, inherently high false positive and false negative rates in measurement have posed great challenges in computational approaches for the prediction of PPI. A good PPI predictor should be 1) resistant to high rate of missing and spurious PPIs, and 2) robust against incompleteness of observed PPI networks. To predict PPI in a network, we developed an intrinsic geometry structure (IGS) for network, which exploits the intrinsic and hidden relationship among proteins in network through a heat diffusion process. In this process, all explicit PPIs participate simultaneously to glue local infinitesimal and noisy experimental interaction data to generate a global macroscopic descriptions about relationships among proteins. The revealed implicit relationship can be interpreted as the probability of two proteins interacting with each other. The revealed relationship is intrinsic and robust against individual, local and explicit protein interactions in the original network. We apply our approach to publicly available PPI network data for the evaluation of the performance of PPI prediction. Experimental results indicate that, under different levels of the missing and spurious PPIs, IGS is able to robustly exploit the intrinsic and hidden relationship for PPI prediction with a higher sensitivity and specificity compared to that of recently proposed methods.
Collapse
|
8
|
Zhang XF, Ou-Yang L, Hu X, Dai DQ. Identifying binary protein-protein interactions from affinity purification mass spectrometry data. BMC Genomics 2015; 16:745. [PMID: 26438428 PMCID: PMC4595009 DOI: 10.1186/s12864-015-1944-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2014] [Accepted: 09/22/2015] [Indexed: 02/04/2023] Open
Abstract
Background The identification of protein-protein interactions contributes greatly to the understanding of functional organization within cells. With the development of affinity purification-mass spectrometry (AP-MS) techniques, several computational scoring methods have been proposed to detect protein interactions from AP-MS data. However, most of the current methods focus on the detection of co-complex interactions and do not discriminate between direct physical interactions and indirect interactions. Consequently, less is known about the precise physical wiring diagram within cells. Results In this paper, we develop a Binary Interaction Network Model (BINM) to computationally identify direct physical interactions from co-complex interactions which can be inferred from purification data using previous scoring methods. This model provides a mathematical framework for capturing topological relationships between direct physical interactions and observed co-complex interactions. It reassigns a confidence score to each observed interaction to indicate its propensity to be a direct physical interaction. Then observed interactions with high confidence scores are predicted as direct physical interactions. We run our model on two yeast co-complex interaction networks which are constructed by two different scoring methods on a same combined AP-MS data. The direct physical interactions identified by various methods are comprehensively benchmarked against different reference sets that provide both direct and indirect evidence for physical contacts. Experiment results show that our model has a competitive performance over the state-of-the-art methods. Conclusions According to the results obtained in this study, BINM is a powerful scoring method that can solely use network topology to predict direct physical interactions from AP-MS data. This study provides us an alternative approach to explore the information inherent in AP-MS data. The software can be downloaded from https://github.com/Zhangxf-ccnu/BINM. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1944-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiao-Fei Zhang
- School of Mathematics and Statistics, Central China Normal University, Luoyu Road, Wuhan, 430079, China.
| | - Le Ou-Yang
- Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Xingang West Road, Guangzhou, 510275, China.
| | - Xiaohua Hu
- School of Computer, Central China Normal University, 774 Luoyu Road, Wuhan, 430079, China. .,College of Information Science and Technology, Drexel University, Chestnut Street, Philadelphia, 19104, USA.
| | - Dao-Qing Dai
- Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Xingang West Road, Guangzhou, 510275, China.
| |
Collapse
|
9
|
Alanis-Lobato G. Mining protein interactomes to improve their reliability and support the advancement of network medicine. Front Genet 2015; 6:296. [PMID: 26442112 PMCID: PMC4585290 DOI: 10.3389/fgene.2015.00296] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 09/07/2015] [Indexed: 12/12/2022] Open
Abstract
High-throughput detection of protein interactions has had a major impact in our understanding of the intricate molecular machinery underlying the living cell, and has permitted the construction of very large protein interactomes. The protein networks that are currently available are incomplete and a significant percentage of their interactions are false positives. Fortunately, the structural properties observed in good quality social or technological networks are also present in biological systems. This has encouraged the development of tools, to improve the reliability of protein networks and predict new interactions based merely on the topological characteristics of their components. Since diseases are rarely caused by the malfunction of a single protein, having a more complete and reliable interactome is crucial in order to identify groups of inter-related proteins involved in disease etiology. These system components can then be targeted with minimal collateral damage. In this article, an important number of network mining tools is reviewed, together with resources from which reliable protein interactomes can be constructed. In addition to the review, a few representative examples of how molecular and clinical data can be integrated to deepen our understanding of pathogenesis are discussed.
Collapse
Affiliation(s)
- Gregorio Alanis-Lobato
- Faculty of Biology, Institute of Molecular Biology, Johannes Gutenberg University of Mainz Mainz, Germany ; Integrative Systems Biology Lab, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology Thuwal, Saudi Arabia
| |
Collapse
|
10
|
Abstract
Protein-protein interactions (PPIs) play essential roles for determining the outcomes of most of the cellular functions of the cell. Although the experimentally detected high-throughput PPI data promise new opportunities for the study of many biological mechanisms including cellular metabolism and protein functions, experimentally detected PPIs have high levels of false positive rate. Therefore, it is of high practical value to develop novel computational tools for pruning low-confidence PPIs. In this paper, we propose a new geometric approach called Leave-One-Out Logistic Metric Embedding (LOO-LME) for assessing the reliability of interactions. Unlike previous approaches which mainly seek to preserve the noisy topological information of the PPI networks in the embedding space, LOO-LME first transforms the learning task into an equivalent discriminant form, then directly deals with the uncertainty in PPI networks using a leave-one-out-style approach. The experimental results show that LOO-LME substantially outperforms previous methods on PPI assessment problems. LOO-LME could thus facilitate further graph-based studies of PPIs and may help infer their hidden underlying biological knowledge.
Collapse
|
11
|
Luo X, You Z, Zhou M, Li S, Leung H, Xia Y, Zhu Q. A highly efficient approach to protein interactome mapping based on collaborative filtering framework. Sci Rep 2015; 5:7702. [PMID: 25572661 PMCID: PMC4287731 DOI: 10.1038/srep07702] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 12/08/2014] [Indexed: 12/17/2022] Open
Abstract
The comprehensive mapping of protein-protein interactions (PPIs) is highly desired for one to gain deep insights into both fundamental cell biology processes and the pathology of diseases. Finely-set small-scale experiments are not only very expensive but also inefficient to identify numerous interactomes despite their high accuracy. High-throughput screening techniques enable efficient identification of PPIs; yet the desire to further extract useful knowledge from these data leads to the problem of binary interactome mapping. Network topology-based approaches prove to be highly efficient in addressing this problem; however, their performance deteriorates significantly on sparse putative PPI networks. Motivated by the success of collaborative filtering (CF)-based approaches to the problem of personalized-recommendation on large, sparse rating matrices, this work aims at implementing a highly efficient CF-based approach to binary interactome mapping. To achieve this, we first propose a CF framework for it. Under this framework, we model the given data into an interactome weight matrix, where the feature-vectors of involved proteins are extracted. With them, we design the rescaled cosine coefficient to model the inter-neighborhood similarity among involved proteins, for taking the mapping process. Experimental results on three large, sparse datasets demonstrate that the proposed approach outperforms several sophisticated topology-based approaches significantly.
Collapse
Affiliation(s)
- Xin Luo
- X. Luo, Y. Xia and Q. Zhu are with the College of Computer Science, Chongqing University, Chongqing, 400044 China
- X. Luo, Z. You, S. Li and H. Leung are with the Department of Computing, Hong Kong Polytechnic University, Hong Kong, HK 999077, China
| | - Zhuhong You
- X. Luo, Z. You, S. Li and H. Leung are with the Department of Computing, Hong Kong Polytechnic University, Hong Kong, HK 999077, China
| | - Mengchu Zhou
- M. Zhou is with the Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Shuai Li
- X. Luo, Z. You, S. Li and H. Leung are with the Department of Computing, Hong Kong Polytechnic University, Hong Kong, HK 999077, China
| | - Hareton Leung
- X. Luo, Z. You, S. Li and H. Leung are with the Department of Computing, Hong Kong Polytechnic University, Hong Kong, HK 999077, China
| | - Yunni Xia
- X. Luo, Y. Xia and Q. Zhu are with the College of Computer Science, Chongqing University, Chongqing, 400044 China
| | - Qingsheng Zhu
- X. Luo, Y. Xia and Q. Zhu are with the College of Computer Science, Chongqing University, Chongqing, 400044 China
| |
Collapse
|
12
|
Klapa MI, Tsafou K, Theodoridis E, Tsakalidis A, Moschonas NK. Reconstruction of the experimentally supported human protein interactome: what can we learn? BMC Syst Biol 2013; 7:96. [PMID: 24088582 PMCID: PMC4015887 DOI: 10.1186/1752-0509-7-96] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Accepted: 09/25/2013] [Indexed: 02/02/2023]
Abstract
BACKGROUND Understanding the topology and dynamics of the human protein-protein interaction (PPI) network will significantly contribute to biomedical research, therefore its systematic reconstruction is required. Several meta-databases integrate source PPI datasets, but the protein node sets of their networks vary depending on the PPI data combined. Due to this inherent heterogeneity, the way in which the human PPI network expands via multiple dataset integration has not been comprehensively analyzed. We aim at assembling the human interactome in a global structured way and exploring it to gain insights of biological relevance. RESULTS First, we defined the UniProtKB manually reviewed human "complete" proteome as the reference protein-node set and then we mined five major source PPI datasets for direct PPIs exclusively between the reference proteins. We updated the protein and publication identifiers and normalized all PPIs to the UniProt identifier level. The reconstructed interactome covers approximately 60% of the human proteome and has a scale-free structure. No apparent differentiating gene functional classification characteristics were identified for the unrepresented proteins. The source dataset integration augments the network mainly in PPIs. Polyubiquitin emerged as the highest-degree node, but the inclusion of most of its identified PPIs may be reconsidered. The high number (>300) of connections of the subsequent fifteen proteins correlates well with their essential biological role. According to the power-law network structure, the unrepresented proteins should mainly have up to four connections with equally poorly-connected interactors. CONCLUSIONS Reconstructing the human interactome based on the a priori definition of the protein nodes enabled us to identify the currently included part of the human "complete" proteome, and discuss the role of the proteins within the network topology with respect to their function. As the network expansion has to comply with the scale-free theory, we suggest that the core of the human interactome has essentially emerged. Thus, it could be employed in systems biology and biomedical research, despite the considerable number of currently unrepresented proteins. The latter are probably involved in specialized physiological conditions, justifying the scarcity of related PPI information, and their identification can assist in designing relevant functional experiments and targeted text mining algorithms.
Collapse
Affiliation(s)
- Maria I Klapa
- Department of General Biology, School of Medicine, University of Patras, Rio, Patras, Greece.
| | | | | | | | | |
Collapse
|