1
|
Lawson S, Donovan D, Lefevre J. An application of node and edge nonlinear hypergraph centrality to a protein complex hypernetwork. PLoS One 2024; 19:e0311433. [PMID: 39361678 PMCID: PMC11449304 DOI: 10.1371/journal.pone.0311433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 09/12/2024] [Indexed: 10/05/2024] Open
Abstract
The use of graph centrality measures applied to biological networks, such as protein interaction networks, underpins much research into identifying key players within biological processes. This approach however is restricted to dyadic interactions and it is well-known that in many instances interactions are polyadic. In this study we illustrate the merit of using hypergraph centrality applied to a hypernetwork as an alternative. Specifically, we review and propose an extension to a recently introduced node and edge nonlinear hypergraph centrality model which provides mutually dependent node and edge centralities. A Saccharomyces Cerevisiae protein complex hypernetwork is used as an example application with nodes representing proteins and hyperedges representing protein complexes. The resulting rankings of the nodes and edges are considered to see if they provide insight into the essentiality of the proteins and complexes. We find that certain variations of the model predict essentiality more accurately and that the degree-based variation illustrates that the centrality-lethality rule extends to a hypergraph setting. In particular, through exploitation of the models flexibility, we identify small sets of proteins densely populated with essential proteins. One of the key advantages of applying this model to a protein complex hypernetwork is that it also provides a classification method for protein complexes, unlike previous approaches which are only concerned with classifying proteins.
Collapse
Affiliation(s)
- Sarah Lawson
- ARC Centre of Excellence, Plant Success in Nature and Agriculture, School of Mathematics and Physics, The University of Queensland, Brisbane, Queensland, Australia
| | - Diane Donovan
- ARC Centre of Excellence, Plant Success in Nature and Agriculture, School of Mathematics and Physics, The University of Queensland, Brisbane, Queensland, Australia
| | - James Lefevre
- ARC Centre of Excellence, Plant Success in Nature and Agriculture, School of Mathematics and Physics, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
2
|
Lu P, Tian J. ACDMBI: A deep learning model based on community division and multi-source biological information fusion predicts essential proteins. Comput Biol Chem 2024; 112:108115. [PMID: 38865861 DOI: 10.1016/j.compbiolchem.2024.108115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 05/15/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024]
Abstract
Accurately identifying essential proteins is vital for drug research and disease diagnosis. Traditional centrality methods and machine learning approaches often face challenges in accurately discerning essential proteins, primarily relying on information derived from protein-protein interaction (PPI) networks. Despite attempts by some researchers to integrate biological data and PPI networks for predicting essential proteins, designing effective integration methods remains a challenge. In response to these challenges, this paper presents the ACDMBI model, specifically designed to overcome the aforementioned issues. ACDMBI is comprised of two key modules: feature extraction and classification. In terms of capturing relevant information, we draw insights from three distinct data sources. Initially, structural features of proteins are extracted from the PPI network through community division. Subsequently, these features are further optimized using Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Moving forward, protein features are extracted from gene expression data utilizing Bidirectional Long Short-Term Memory networks (BiLSTM) and a multi-head self-attention mechanism. Finally, protein features are derived by mapping subcellular localization data to a one-dimensional vector and processing it through fully connected layers. In the classification phase, we integrate features extracted from three different data sources, crafting a multi-layer deep neural network (DNN) for protein classification prediction. Experimental results on brewing yeast data showcase the ACDMBI model's superior performance, with AUC reaching 0.9533 and AUPR reaching 0.9153. Ablation experiments further reveal that the effective integration of features from diverse biological information significantly boosts the model's performance.
Collapse
Affiliation(s)
- Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China.
| | - Jialong Tian
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China.
| |
Collapse
|
3
|
Li Z, Zhang Y, Zhou P. Temporal Protein Complex Identification Based on Dynamic Heterogeneous Protein Information Network Representation Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1154-1164. [PMID: 38190662 DOI: 10.1109/tcbb.2024.3351078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Protein complexes, as the fundamental units of cellular function and regulation, play a crucial role in understanding the normal physiological functions of cells. Existing methods for protein complex identification attempt to introduce other biological information on top of the protein-protein interaction (PPI) network to assist in evaluating the degree of association between proteins. However, these methods usually treat protein interaction networks as flat homogeneous static networks. They cannot distinguish the roles and importance of different types of biological information, nor can they reflect the dynamic changes of protein complexes. In recent years, heterogeneous network representation learning has achieved great success in processing complex heterogeneous information and mining deep semantics. We thus propose a temporal protein complex identification method based on Dynamic Heterogeneous Protein information network Representation Learning, DHPRL. DHPRL naturally integrates multiple types of heterogeneous biological information in the cellular temporal dimension. It simultaneously models the temporal dynamic properties of proteins and the heterogeneity of biological information to improve the understanding of protein interactions and the accuracy of complex prediction. Firstly, we construct Dynamic Heterogeneous Protein Information Network (DHPIN) by integrating temporal gene expression information and GO attribute information. Then we design a dual-view collaborative contrast mechanism. Specifically, proposing to learn protein representations from two views of DHPIN (1-hop relation view and meta-path view) to model the consistency and specificity between nearest-neighbour bio information and deeper biological semantics. The dynamic PPI network is thereafter re-weighted based on the learned protein representations. Finally, we perform protein identification on the re-weighted dynamic PPI network. Extensive experimental results demonstrate that DHPRL can effectively model complicated biological information and achieve state-of-the-art performance in most cases.
Collapse
|
4
|
Pan L, Wang H, Yang B, Li W. A protein network refinement method based on module discovery and biological information. BMC Bioinformatics 2024; 25:157. [PMID: 38643108 PMCID: PMC11031909 DOI: 10.1186/s12859-024-05772-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 04/10/2024] [Indexed: 04/22/2024] Open
Abstract
BACKGROUND The identification of essential proteins can help in understanding the minimum requirements for cell survival and development to discover drug targets and prevent disease. Nowadays, node ranking methods are a common way to identify essential proteins, but the poor data quality of the underlying PIN has somewhat hindered the identification accuracy of essential proteins for these methods in the PIN. Therefore, researchers constructed refinement networks by considering certain biological properties of interacting protein pairs to improve the performance of node ranking methods in the PIN. Studies show that proteins in a complex are more likely to be essential than proteins not present in the complex. However, the modularity is usually ignored for the refinement methods of the PINs. METHODS Based on this, we proposed a network refinement method based on module discovery and biological information. The idea is, first, to extract the maximal connected subgraph in the PIN, and to divide it into different modules by using Fast-unfolding algorithm; then, to detect critical modules according to the orthologous information, subcellular localization information and topology information within each module; finally, to construct a more refined network (CM-PIN) by using the identified critical modules. RESULTS To evaluate the effectiveness of the proposed method, we used 12 typical node ranking methods (LAC, DC, DMNC, NC, TP, LID, CC, BC, PR, LR, PeC, WDC) to compare the overall performance of the CM-PIN with those on the S-PIN, D-PIN and RD-PIN. The experimental results showed that the CM-PIN was optimal in terms of the identification number of essential proteins, precision-recall curve, Jackknifing method and other criteria, and can help to identify essential proteins more accurately.
Collapse
Affiliation(s)
- Li Pan
- Hunan Institute of Science and Technology, Yueyang, 414006, China
- Hunan Engineering Research Center of Multimodal Health Sensing and Intelligent Analysis, Yueyang, 414006, China
| | - Haoyue Wang
- Hunan Institute of Science and Technology, Yueyang, 414006, China.
| | - Bo Yang
- Hunan Institute of Science and Technology, Yueyang, 414006, China
- Hunan Engineering Research Center of Multimodal Health Sensing and Intelligent Analysis, Yueyang, 414006, China
| | - Wenbin Li
- Hunan Institute of Science and Technology, Yueyang, 414006, China.
| |
Collapse
|
5
|
Zhao H, Liu G, Cao X. A seed expansion-based method to identify essential proteins by integrating protein-protein interaction sub-networks and multiple biological characteristics. BMC Bioinformatics 2023; 24:452. [PMID: 38036960 PMCID: PMC10688502 DOI: 10.1186/s12859-023-05583-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 11/24/2023] [Indexed: 12/02/2023] Open
Abstract
BACKGROUND The identification of essential proteins is of great significance in biology and pathology. However, protein-protein interaction (PPI) data obtained through high-throughput technology include a high number of false positives. To overcome this limitation, numerous computational algorithms based on biological characteristics and topological features have been proposed to identify essential proteins. RESULTS In this paper, we propose a novel method named SESN for identifying essential proteins. It is a seed expansion method based on PPI sub-networks and multiple biological characteristics. Firstly, SESN utilizes gene expression data to construct PPI sub-networks. Secondly, seed expansion is performed simultaneously in each sub-network, and the expansion process is based on the topological features of predicted essential proteins. Thirdly, the error correction mechanism is based on multiple biological characteristics and the entire PPI network. Finally, SESN analyzes the impact of each biological characteristic, including protein complex, gene expression data, GO annotations, and subcellular localization, and adopts the biological data with the best experimental results. The output of SESN is a set of predicted essential proteins. CONCLUSIONS The analysis of each component of SESN indicates the effectiveness of all components. We conduct comparison experiments using three datasets from two species, and the experimental results demonstrate that SESN achieves superior performance compared to other methods.
Collapse
Affiliation(s)
- He Zhao
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, Changchun, China.
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China.
| | - Xintian Cao
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
6
|
Han Y, Liu M, Wang Z. Key protein identification by integrating protein complex information and multi-biological features. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:18191-18206. [PMID: 38052554 DOI: 10.3934/mbe.2023808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Identifying key proteins based on protein-protein interaction networks has emerged as a prominent area of research in bioinformatics. However, current methods exhibit certain limitations, such as the omission of subcellular localization information and the disregard for the impact of topological structure noise on the reliability of key protein identification. Moreover, the influence of proteins outside a complex but interacting with proteins inside the complex on complex participation tends to be overlooked. Addressing these shortcomings, this paper presents a novel method for key protein identification that integrates protein complex information with multiple biological features. This approach offers a comprehensive evaluation of protein importance by considering subcellular localization centrality, topological centrality weighted by gene ontology (GO) similarity and complex participation centrality. Experimental results, including traditional statistical metrics, jackknife methodology metric and key protein overlap or difference, demonstrate that the proposed method not only achieves higher accuracy in identifying key proteins compared to nine classical methods but also exhibits robustness across diverse protein-protein interaction networks.
Collapse
Affiliation(s)
- Yongyin Han
- School of Computer Science and Technology, China University of Mining and Technology, China
- Xuzhou College of Industrial Technology, China
| | - Maolin Liu
- School of Computer Science and Technology, China University of Mining and Technology, China
| | - Zhixiao Wang
- School of Computer Science and Technology, China University of Mining and Technology, China
| |
Collapse
|
7
|
Sun J, Pan L, Li B, Wang H, Yang B, Li W. A Construction Method of Dynamic Protein Interaction Networks by Using Relevant Features of Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2790-2801. [PMID: 37030714 DOI: 10.1109/tcbb.2023.3264241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Essential proteins play an important role in various life activities and are considered to be a vital part of the organism. Gene expression data are an important dataset to construct dynamic protein-protein interaction networks (DPIN). The existing methods for the construction of DPINs generally utilize all features (or the features in a cycle) of the gene expression data. However, the features observed from successive time points tend to be highly correlated, and thus there are some redundant and irrelevant features in the gene expression data, which will influence the quality of the constructed network and the predictive performance of essential proteins. To address this problem, we propose a construction method of DPINs by using selected relevant features rather than continuous and periodic features. We adopt an improved unsupervised feature selection method based on Laplacian algorithm to remove irrelevant and redundant features from gene expression data, then integrate the chosen relevant features into the static protein-protein interaction network (SPIN) to construct a more concise and effective DPIN (FS-DPIN). To evaluate the effectiveness of the FS-DPIN, we apply 15 network-based centrality methods on the FS-DPIN and compare the results with those on the SPIN and the existing DPINs. Then the predictive performance of the 15 centrality methods is validated in terms of sensitivity, specificity, positive predictive value, negative predictive value, F-measure, accuracy, Jackknife and AUPRC. The experimental results show that the FS-DPIN is superior to the existing DPINs in the identification accuracy of essential proteins.
Collapse
|
8
|
Chen S, Huang C, Wang L, Zhou S. A disease-related essential protein prediction model based on the transfer neural network. Front Genet 2023; 13:1087294. [PMID: 36685976 PMCID: PMC9845409 DOI: 10.3389/fgene.2022.1087294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 12/14/2022] [Indexed: 01/06/2023] Open
Abstract
Essential proteins play important roles in the development and survival of organisms whose mutations are proven to be the drivers of common internal diseases having higher prevalence rates. Due to high costs of traditional biological experiments, an improved Transfer Neural Network (TNN) was designed to extract raw features from multiple biological information of proteins first, and then, based on the newly-constructed Transfer Neural Network, a novel computational model called TNNM was designed to infer essential proteins in this paper. Different from traditional Markov chain, since Transfer Neural Network adopted the gradient descent algorithm to automatically obtain the transition probability matrix, the prediction accuracy of TNNM was greatly improved. Moreover, additional antecedent memory coefficient and bias term were introduced in Transfer Neural Network, which further enhanced both the robustness and the non-linear expression ability of TNNM as well. Finally, in order to evaluate the identification performance of TNNM, intensive experiments have been executed based on two well-known public databases separately, and experimental results show that TNNM can achieve better performance than representative state-of-the-art prediction models in terms of both predictive accuracies and decline rate of accuracies. Therefore, TNNM may play an important role in key protein prediction in the future.
Collapse
Affiliation(s)
- Sisi Chen
- The First Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China
| | - Chiguo Huang
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China,*Correspondence: Chiguo Huang, ; Lei Wang, ; Shunxian Zhou,
| | - Lei Wang
- The First Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China,Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China,*Correspondence: Chiguo Huang, ; Lei Wang, ; Shunxian Zhou,
| | - Shunxian Zhou
- The First Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China,Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China,College of Information Science and Engineering, Hunan Women’s University, Changsha, Hunan, China,*Correspondence: Chiguo Huang, ; Lei Wang, ; Shunxian Zhou,
| |
Collapse
|
9
|
Brucely Y, Paulraj G, Thilak M, Karthikeyan MG. Minimization of defects in glove manufacturing using total failure mode effects analysis flower pollination optimization. AIP CONFERENCE PROCEEDINGS 2023; 2949:020030. [DOI: 10.1063/5.0157418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
|
10
|
Xue X, Zhang W, Fan A. Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins. PLoS One 2023; 18:e0284274. [PMID: 37083829 PMCID: PMC10121005 DOI: 10.1371/journal.pone.0284274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 03/28/2023] [Indexed: 04/22/2023] Open
Abstract
Identifying key proteins from protein-protein interaction (PPI) networks is one of the most fundamental and important tasks for computational biologists. However, the protein interactions obtained by high-throughput technology are characterized by a high false positive rate, which severely hinders the prediction accuracy of the current computational methods. In this paper, we propose a novel strategy to identify key proteins by constructing reliable PPI networks. Five Gene Ontology (GO)-based semantic similarity measurements (Jiang, Lin, Rel, Resnik, and Wang) are used to calculate the confidence scores for protein pairs under three annotation terms (Molecular function (MF), Biological process (BP), and Cellular component (CC)). The protein pairs with low similarity values are assumed to be low-confidence links, and the refined PPI networks are constructed by filtering the low-confidence links. Six topology-based centrality methods (the BC, DC, EC, NC, SC, and aveNC) are applied to test the performance of the measurements under the original network and refined network. We systematically compare the performance of the five semantic similarity metrics with the three GO annotation terms on four benchmark datasets, and the simulation results show that the performance of these centrality methods under refined PPI networks is relatively better than that under the original networks. Resnik with a BP annotation term performs best among all five metrics with the three annotation terms. These findings suggest the importance of semantic similarity metrics in measuring the reliability of the links between proteins and highlight the Resnik metric with the BP annotation term as a favourable choice.
Collapse
Affiliation(s)
- Xiaoli Xue
- School of Science, East China Jiaotong University, Nanchang, China
| | - Wei Zhang
- School of Science, East China Jiaotong University, Nanchang, China
| | - Anjing Fan
- School of Computer and Information Engineering, Anyang Normal University, Anyang, China
| |
Collapse
|
11
|
Rule-Based Pruning and In Silico Identification of Essential Proteins in Yeast PPIN. Cells 2022; 11:cells11172648. [PMID: 36078056 PMCID: PMC9454873 DOI: 10.3390/cells11172648] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/18/2022] [Accepted: 08/22/2022] [Indexed: 11/25/2022] Open
Abstract
Proteins are vital for the significant cellular activities of living organisms. However, not all of them are essential. Identifying essential proteins through different biological experiments is relatively more laborious and time-consuming than the computational approaches used in recent times. However, practical implementation of conventional scientific methods sometimes becomes challenging due to poor performance impact in specific scenarios. Thus, more developed and efficient computational prediction models are required for essential protein identification. An effective methodology is proposed in this research, capable of predicting essential proteins in a refined yeast protein–protein interaction network (PPIN). The rule-based refinement is done using protein complex and local interaction density information derived from the neighborhood properties of proteins in the network. Identification and pruning of non-essential proteins are equally crucial here. In the initial phase, careful assessment is performed by applying node and edge weights to identify and discard the non-essential proteins from the interaction network. Three cut-off levels are considered for each node and edge weight for pruning the non-essential proteins. Once the PPIN has been filtered out, the second phase starts with two centralities-based approaches: (1) local interaction density (LID) and (2) local interaction density with protein complex (LIDC), which are successively implemented to identify the essential proteins in the yeast PPIN. Our proposed methodology achieves better performance in comparison to the existing state-of-the-art techniques.
Collapse
|
12
|
Schapke J, Tavares A, Recamonde-Mendoza M. EPGAT: Gene Essentiality Prediction With Graph Attention Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1615-1626. [PMID: 33497339 DOI: 10.1109/tcbb.2021.3054738] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Identifying essential genes and proteins is a critical step towards a better understanding of human biology and pathology. Computational approaches helped to mitigate experimental constraints by exploring machine learning (ML) methods and the correlation of essentiality with biological information, especially protein-protein interaction (PPI) networks, to predict essential genes. Nonetheless, their performance is still limited, as network-based centralities are not exclusive proxies of essentiality, and traditional ML methods are unable to learn from non-euclidean domains such as graphs. Given these limitations, we proposed EPGAT, an approach for Essentiality Prediction based on Graph Attention Networks (GATs), which are attention-based Graph Neural Networks (GNNs), operating on graph-structured data. Our model directly learns gene essentiality patterns from PPI networks, integrating additional evidence from multiomics data encoded as node attributes. We benchmarked EPGAT for four organisms, including humans, accurately predicting gene essentiality with ROC AUC score ranging from 0.78 to 0.97. Our model significantly outperformed network-based and shallow ML-based methods and achieved a very competitive performance against the state-of-the-art node2vec embedding method. Notably, EPGAT was the most robust approach in scenarios with limited and imbalanced training data. Thus, the proposed approach offers a powerful and effective way to identify essential genes and proteins.
Collapse
|
13
|
Zhu X, Zhu Y, Tan Y, Chen Z, Wang L. An Iterative Method for Predicting Essential Proteins Based on Multifeature Fusion and Linear Neighborhood Similarity. Front Aging Neurosci 2022; 13:799500. [PMID: 35140599 PMCID: PMC8819145 DOI: 10.3389/fnagi.2021.799500] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 12/02/2021] [Indexed: 11/13/2022] Open
Abstract
Growing evidence have demonstrated that many biological processes are inseparable from the participation of key proteins. In this paper, a novel iterative method called linear neighborhood similarity-based protein multifeatures fusion (LNSPF) is proposed to identify potential key proteins based on multifeature fusion. In LNSPF, an original protein-protein interaction (PPI) network will be constructed first based on known protein-protein interaction data downloaded from benchmark databases, based on which, topological features will be further extracted. Next, gene expression data of proteins will be adopted to transfer the original PPI network to a weighted PPI network based on the linear neighborhood similarity. After that, subcellular localization and homologous information of proteins will be integrated to extract functional features for proteins, and based on both functional and topological features obtained above. And then, an iterative method will be designed and carried out to predict potential key proteins. At last, for evaluating the predictive performance of LNSPF, extensive experiments have been done, and compare results between LNPSF and 15 state-of-the-art competitive methods have demonstrated that LNSPF can achieve satisfactory recognition accuracy, which is markedly better than that achieved by each competing method.
Collapse
Affiliation(s)
- Xianyou Zhu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
| | - Yaocan Zhu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Yihong Tan
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Zhiping Chen
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|
14
|
Zhang Y, Liang S, Feng Y, Wang Q, Sun F, Chen S, Yang Y, He X, Zhu H, Pan H. Automation of literature screening using machine learning in medical evidence synthesis: a diagnostic test accuracy systematic review protocol. Syst Rev 2022; 11:11. [PMID: 35031074 PMCID: PMC8760775 DOI: 10.1186/s13643-021-01881-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 12/27/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Systematic review is an indispensable tool for optimal evidence collection and evaluation in evidence-based medicine. However, the explosive increase of the original literatures makes it difficult to accomplish critical appraisal and regular update. Artificial intelligence (AI) algorithms have been applied to automate the literature screening procedure in medical systematic reviews. In these studies, different algorithms were used and results with great variance were reported. It is therefore imperative to systematically review and analyse the developed automatic methods for literature screening and their effectiveness reported in current studies. METHODS An electronic search will be conducted using PubMed, Embase, ACM Digital Library, and IEEE Xplore Digital Library databases, as well as literatures found through supplementary search in Google scholar, on automatic methods for literature screening in systematic reviews. Two reviewers will independently conduct the primary screening of the articles and data extraction, in which nonconformities will be solved by discussion with a methodologist. Data will be extracted from eligible studies, including the basic characteristics of study, the information of training set and validation set, and the function and performance of AI algorithms, and summarised in a table. The risk of bias and applicability of the eligible studies will be assessed by the two reviewers independently based on Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2). Quantitative analyses, if appropriate, will also be performed. DISCUSSION Automating systematic review process is of great help in reducing workload in evidence-based practice. Results from this systematic review will provide essential summary of the current development of AI algorithms for automatic literature screening in medical evidence synthesis and help to inspire further studies in this field. SYSTEMATIC REVIEW REGISTRATION PROSPERO CRD42020170815 (28 April 2020).
Collapse
Affiliation(s)
- Yuelun Zhang
- Medical Research Center, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Siyu Liang
- Department of Endocrinology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 1 Shuaifuyuan, Dongcheng District, Beijing, China
| | - Yunying Feng
- Eight-year Program of Clinical Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Qing Wang
- Research Institute of Information and Technology, Tsinghua University, Beijing, China
| | - Feng Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing, China
| | - Shi Chen
- Department of Endocrinology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 1 Shuaifuyuan, Dongcheng District, Beijing, China
| | - Yiying Yang
- Eight-year Program of Clinical Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xin He
- Eight-year Program of Clinical Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Huijuan Zhu
- Department of Endocrinology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 1 Shuaifuyuan, Dongcheng District, Beijing, China
| | - Hui Pan
- Department of Endocrinology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 1 Shuaifuyuan, Dongcheng District, Beijing, China
| |
Collapse
|
15
|
Liu Y, Liang H, Zou Q, He Z. Significance-Based Essential Protein Discovery. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:633-642. [PMID: 32750873 DOI: 10.1109/tcbb.2020.3004364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The identification of essential proteins is an important problem in bioinformatics. During the past decades, many centrality measures and algorithms have been proposed to address this issue. However, existing methods still deserve the following drawbacks: (1) the lack of a context-free and readily interpretable quantification of their centrality values; (2) the difficulty of specifying a proper threshold for their centrality values; (3) the incapability of controlling the quality of reported essential proteins in a statistically sound manner. To overcome the limitations of existing solutions, we tackle the essential protein discovery problem from a significance testing perspective. More precisely, the essential protein discovery problem is formulated as a multiple hypothesis testing problem, where the null hypothesis is that each protein is not an essential protein. To quantify the statistical significance of each protein, we present a p-value calculation method in which both the degree and the local clustering coefficient are used as the test statistic and the Erdös-Rényi model is employed as the random graph model. After calculating the p-value for each protein, the false discovery rate is used as the error rate in the multiple testing correction procedure. Our significance-based essential protein discovery method is named as SigEP, which is tested on both simulated networks and real PPI networks. The experimental results show that our method is able to achieve better performance than those competing algorithms.
Collapse
|
16
|
Liu Y, Chen W, He Z. Essential Protein Recognition via Community Significance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2788-2794. [PMID: 34347602 DOI: 10.1109/tcbb.2021.3102018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Essential protein plays a vital role in understanding the cellular life. With the advance in high-throughput technologies, a number of protein-protein interaction (PPI) networks have been constructed such that essential proteins can be identified from a system biology perspective. Although a series of network-based essential protein discovery methods have been proposed, these existing methods still have some drawbacks. Recently, it has been shown that the significance-based method SigEP is promising on overcoming the defects that are inherent in currently available essential protein identification methods. However, the SigEP method is developed under the unrealistic Erdös-Rényi (E-R) model and its time complexity is very high. Hence, we propose a new significance-based essential protein recognition method named EPCS in which the essential protein discovery problem is formulated as a community significance testing problem. Experimental results on four PPI networks show that EPCS performs better than nine state-of-the-art essential protein identification methods and the only significance-based essential protein identification method SigEP.
Collapse
|
17
|
Campos TL, Korhonen PK, Hofmann A, Gasser RB, Young ND. Harnessing model organism genomics to underpin the machine learning-based prediction of essential genes in eukaryotes - Biotechnological implications. Biotechnol Adv 2021; 54:107822. [PMID: 34461202 DOI: 10.1016/j.biotechadv.2021.107822] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/17/2021] [Accepted: 08/24/2021] [Indexed: 12/17/2022]
Abstract
The availability of high-quality genomes and advances in functional genomics have enabled large-scale studies of essential genes in model eukaryotes, including the 'elegant worm' (Caenorhabditis elegans; Nematoda) and the 'vinegar fly' (Drosophila melanogaster; Arthropoda). However, this is not the case for other, much less-studied organisms, such as socioeconomically important parasites, for which functional genomic platforms usually do not exist. Thus, there is a need to develop innovative techniques or approaches for the prediction, identification and investigation of essential genes. A key approach that could enable the prediction of such genes is machine learning (ML). Here, we undertake an historical review of experimental and computational approaches employed for the characterisation of essential genes in eukaryotes, with a particular focus on model ecdysozoans (C. elegans and D. melanogaster), and discuss the possible applicability of ML-approaches to organisms such as socioeconomically important parasites. We highlight some recent results showing that high-performance ML, combined with feature engineering, allows a reliable prediction of essential genes from extensive, publicly available 'omic data sets, with major potential to prioritise such genes (with statistical confidence) for subsequent functional genomic validation. These findings could 'open the door' to fundamental and applied research areas. Evidence of some commonality in the essential gene-complement between these two organisms indicates that an ML-engineering approach could find broader applicability to ecdysozoans such as parasitic nematodes or arthropods, provided that suitably large and informative data sets become/are available for proper feature engineering, and for the robust training and validation of algorithms. This area warrants detailed exploration to, for example, facilitate the identification and characterisation of essential molecules as novel targets for drugs and vaccines against parasitic diseases. This focus is particularly important, given the substantial impact that such diseases have worldwide, and the current challenges associated with their prevention and control and with drug resistance in parasite populations.
Collapse
Affiliation(s)
- Tulio L Campos
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia; Bioinformatics Core Facility, Instituto Aggeu Magalhães, Fundação Oswaldo Cruz (IAM-Fiocruz), Recife, Pernambuco, Brazil
| | - Pasi K Korhonen
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Andreas Hofmann
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Neil D Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| |
Collapse
|
18
|
Peng J, Kuang L, Zhang Z, Tan Y, Chen Z, Wang L. A Novel Model for Identifying Essential Proteins Based on Key Target Convergence Sets. Front Genet 2021; 12:721486. [PMID: 34394201 PMCID: PMC8358660 DOI: 10.3389/fgene.2021.721486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 06/30/2021] [Indexed: 11/20/2022] Open
Abstract
In recent years, many computational models have been designed to detect essential proteins based on protein-protein interaction (PPI) networks. However, due to the incompleteness of PPI networks, the prediction accuracy of these models is still not satisfactory. In this manuscript, a novel key target convergence sets based prediction model (KTCSPM) is proposed to identify essential proteins. In KTCSPM, a weighted PPI network and a weighted (Domain-Domain Interaction) network are constructed first based on known PPIs and PDIs downloaded from benchmark databases. And then, by integrating these two kinds of networks, a novel weighted PDI network is built. Next, through assigning a unique key target convergence set (KTCS) for each node in the weighted PDI network, an improved method based on the random walk with restart is designed to identify essential proteins. Finally, in order to evaluate the predictive effects of KTCSPM, it is compared with 12 competitive state-of-the-art models, and experimental results show that KTCSPM can achieve better prediction accuracy. Considering the satisfactory predictive performance achieved by KTCSPM, it indicates that KTCSPM might be a good supplement to the future research on prediction of essential proteins.
Collapse
Affiliation(s)
- Jiaxin Peng
- College of Computer, Xiangtan University, Xiangtan, China.,College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Linai Kuang
- College of Computer, Xiangtan University, Xiangtan, China
| | - Zhen Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Yihong Tan
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Zhiping Chen
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer, Xiangtan University, Xiangtan, China.,College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|
19
|
Chakrapani HB, Chourasia S, Gupta S, Kumar D T, Doss C GP, Haldar R. Effective utilisation of influence maximization technique for the identification of significant nodes in breast cancer gene networks. Comput Biol Med 2021; 133:104378. [PMID: 33971587 DOI: 10.1016/j.compbiomed.2021.104378] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 03/28/2021] [Accepted: 04/02/2021] [Indexed: 10/21/2022]
Abstract
BACKGROUND Identifying the most important genes in a cancer gene network is a crucial step in understanding the disease's functional characteristics and finding an effective drug. METHOD In this study, a popular influence maximization technique was applied on a large breast cancer gene network to identify the most influential genes computationally. The novel approach involved incorporating gene expression data and protein to protein interaction network to create a customized pruned and weighted gene network. This was then readily provided to the influence maximization procedure. The weighted gene network was also processed through a widely accepted framework that identified essential proteins to benchmark the proposed method. RESULTS The proposed method's results had matched with the majority of the output from the benchmarked framework. The key takeaway from the experiment was that the influential genes identified by the proposed method, which did not match favorably with the widely accepted framework, were found to be very important by previous in-vivo studies on breast cancer. INTERPRETATION & CONCLUSION The new findings generated from the proposed method give us a favorable reason to infer that influence maximization added a more diversified approach to define and identify important genes and could be incorporated with other popular computational techniques for more relevant results.
Collapse
Affiliation(s)
| | - Smruti Chourasia
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| | - Sibasish Gupta
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| | - Thirumal Kumar D
- Meenakshi Academy of Higher Education and Research, Chennai, India
| | - George Priya Doss C
- School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, India
| | - Rishin Haldar
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
| |
Collapse
|
20
|
CEGSO: Boosting Essential Proteins Prediction by Integrating Protein Complex, Gene Expression, Gene Ontology, Subcellular Localization and Orthology Information. Interdiscip Sci 2021; 13:349-361. [PMID: 33772722 DOI: 10.1007/s12539-021-00426-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 02/04/2021] [Accepted: 03/05/2021] [Indexed: 01/13/2023]
Abstract
Essential proteins are assumed to be an indispensable element in sustaining normal physiological function and crucial to drug design and disease diagnosis. The discovery of essential proteins is of great importance in revealing the molecular mechanisms and biological processes. Owing to the tedious biological experiment, many numerical methods have been developed to discover key proteins by mining the features of the high throughput data. Appropriate integration of differential biological information based on protein-protein interaction (PPI) network has been proven useful in predicting essential proteins. The main intention of this research is to provide a comprehensive study and a review on identifying essential proteins by integrating multi-source data and provide guidance for researchers. Detailed analysis and comparison of current essential protein prediction algorithms have been carried out and tested on benchmark PPI networks. In addition, based on the previous method TEGS (short for the network Topology, gene Expression, Gene ontology, and Subcellular localization), we improve the performance of predicting essential proteins by incorporating known protein complex information, the gene expression profile, Gene Ontology (GO) terms information, subcellular localization information, and protein's orthology data into the PPI network, named CEGSO. The simulation results show that CEGSO achieves more accurate and robust results than other compared methods under different test datasets with various evaluation measurements.
Collapse
|
21
|
Chen X, Xu M, An Y. Identifying the essential nodes in network pharmacology based on multilayer network combined with random walk algorithm. J Biomed Inform 2020; 114:103666. [PMID: 33352331 DOI: 10.1016/j.jbi.2020.103666] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 12/11/2020] [Accepted: 12/12/2020] [Indexed: 11/15/2022]
Abstract
Compared with the general complex network, the multilayer network is more suitable for the description of reality. It can be used as a tool of network pharmacology to analyze the mechanism of drug action from an overall perspective. Combined with random walk algorithm, it measures the importance of nodes from the entire network rather than a single layer. Here a four-layer network was constructed based on the data about the action process of prescriptions, consisting of ingredients, target proteins, metabolic pathways and diseases. The random walk algorithm was used to calculate the betweenness centrality of the protein layer nodes to get the rank of their importance. According to above method, we screened out the top 10% proteins that play a key role in treatment. Prescriptions Xiaochaihu Decoction was taken as example to prove our method. The selected proteins were measured with the ones that have been validated to be associated with the treated diseases. The results showed that its accuracy was no less than the topology-based method of single-layer network. The applicability of our method was proved by another prescription Yupingfeng Decoction. Our study demonstrated that multilayer network combined with random walk algorithm was an effective method for pre-screening vital target proteins related to prescriptions.
Collapse
Affiliation(s)
- Xianlai Chen
- Big Data Institute, Central South University, Changsha, Hunan, China.
| | - Mingyue Xu
- Big Data Institute, Central South University, Changsha, Hunan, China.
| | - Ying An
- Big Data Institute, Central South University, Changsha, Hunan, China.
| |
Collapse
|
22
|
Zhang W, Xu J, Zou X. Predicting Essential Proteins by Integrating Network Topology, Subcellular Localization Information, Gene Expression Profile and GO Annotation Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2053-2061. [PMID: 31095490 DOI: 10.1109/tcbb.2019.2916038] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Essential proteins are indispensable for maintaining normal cellular functions. Identification of essential proteins from Protein-protein interaction (PPI) networks has become a hot topic in recent years. Traditionally biological experimental based approaches are time-consuming and expensive, although lots of computational based methods have been developed in the past years; however, the prediction accuracy is still unsatisfied. In this research, by introducing the protein sub-cellular localization information, we define a new measurement for characterizing the protein's subcellular localization essentiality, and a new data fusion based method is developed for identifying essential proteins, named TEGS, based on integrating network topology, gene expression profile, GO annotation information, and protein subcellular localization information. To demonstrate the efficiency of the proposed method TEGS, we evaluate its performance on two Saccharomyces cerevisiae datasets and compare with other seven state-of-the-art methods (DC, BC, NC, PeC, WDC, SON, and TEO) in terms of true predicted number, jackknife curve, and precision-recall curve. Simulation results show that the TEGS outperforms the other compared methods in identifying essential proteins. The source code of TEGS is freely available at https://github.com/wzhangwhu/TEGS.
Collapse
|
23
|
Athira K, Gopakumar G. An integrated method for identifying essential proteins from multiplex network model of protein-protein interactions. J Bioinform Comput Biol 2020; 18:2050020. [PMID: 32795133 DOI: 10.1142/s0219720020500201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Cell survival requires the presence of essential proteins. Detection of essential proteins is relevant not only because of the critical biological functions they perform but also the role played by them as a drug target against pathogens. Several computational techniques are in place to identify essential proteins based on protein-protein interaction (PPI) network. Essential protein detection using only physical interaction data of proteins is challenging due to its inherent uncertainty. Hence, in this work, we propose a multiplex network-based framework that incorporates multiple protein interaction data from their physical, coexpression and phylogenetic profiles. An extended version termed as multiplex eigenvector centrality (MEC) is used to identify essential proteins from this network. The methodology integrates the score obtained from the multiplex analysis with subcellular localization and Gene Ontology information and is implemented using Saccharomyces cerevisiae datasets. The proposed method outperformed many recent essential protein prediction techniques in the literature.
Collapse
Affiliation(s)
- K Athira
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kozhikkode, Kerala 673601, India
| | - G Gopakumar
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kozhikkode, Kerala 673601, India
| |
Collapse
|
24
|
Li G, Li M, Wang J, Li Y, Pan Y. United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1451-1458. [PMID: 30596582 DOI: 10.1109/tcbb.2018.2889978] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expensive, and inefficient. The topological features of protein-protein interaction (PPI) networks are often used to design computational prediction methods, such as Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), and Neighborhood Centrality (NC). However, the prediction accuracies of these individual methods still have space to be improved. Studies show that additional information, such as orthologous relations, helps discover essential proteins. Many researchers have proposed different methods by combining multiple information sources to gain improvement of prediction accuracy. In this study, we find that essential proteins appear in triangular structure in PPI network significantly more often than nonessential ones. Based on this phenomenon, we propose a novel pure centrality measure, so-called Neighborhood Closeness Centrality (NCC). Accordingly, we develop a new combination model, Extended Pareto Optimality Consensus model, named EPOC, to fuse NCC and Orthology information and a novel essential proteins identification method, NCCO, is fully proposed. Compared with seven existing classic centrality methods (DC, BC, IC, CC, SC, EC, and NC) and three consensus methods (PeC, ION, and CSC), our results on S.cerevisiae and E.coli datasets show that NCCO has clear advantages. As a consensus method, EPOC also yields better performance than the random walk model.
Collapse
|
25
|
Zhao B, Hu S, Liu X, Xiong H, Han X, Zhang Z, Li X, Wang L. A Novel Computational Approach for Identifying Essential Proteins From Multiplex Biological Networks. Front Genet 2020; 11:343. [PMID: 32373163 PMCID: PMC7186452 DOI: 10.3389/fgene.2020.00343] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 03/23/2020] [Indexed: 11/13/2022] Open
Abstract
The identification of essential proteins can help in understanding the minimum requirements for cell survival and development. Ever-increasing amounts of high-throughput data provide us with opportunities to detect essential proteins from protein interaction networks (PINs). Existing network-based approaches are limited by the poor quality of the underlying PIN data, which exhibits high rates of false positive and false negative results. To overcome this problem, researchers have focused on the prediction of essential proteins by combining PINs with other biological data, which has led to the emergence of various interactions between proteins. It remains challenging, however, to use aggregated multiplex interactions within a single analysis framework to identify essential proteins. In this study, we created a multiplex biological network (MON) by initially integrating PINs, protein domains, and gene expression profiles. Next, we proposed a new approach to discover essential proteins by extending the random walk with restart algorithm to the tensor, which provides a data model representation of the MON. In contrast to existing approaches, the proposed MON approach considers for the importance of nodes and the different types of interactions between proteins during the iteration. MON was implemented to identify essential proteins within two yeast PINs. Our comprehensive experimental results demonstrated that MON outperformed 11 other state-of-the-art approaches in terms of precision-recall curve, jackknife curve, and other criteria.
Collapse
Affiliation(s)
- Bihai Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, China.,Hunan Provincial Key Laboratory of Nutrition and Quality Control of Aquatic Animals, Changsha University, Changsha, China
| | - Sai Hu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Xiner Liu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Huijun Xiong
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Xiao Han
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Zhihong Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, China
| | - Xueyong Li
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, China
| |
Collapse
|
26
|
Li M, Meng X, Zheng R, Wu FX, Li Y, Pan Y, Wang J. Identification of Protein Complexes by Using a Spatial and Temporal Active Protein Interaction Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:817-827. [PMID: 28885159 DOI: 10.1109/tcbb.2017.2749571] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The rapid development of proteomics and high-throughput technologies has produced a large amount of Protein-Protein Interaction (PPI) data, which makes it possible for considering dynamic properties of protein interaction networks (PINs) instead of static properties. Identification of protein complexes from dynamic PINs becomes a vital scientific problem for understanding cellular life in the post genome era. Up to now, plenty of models or methods have been proposed for the construction of dynamic PINs to identify protein complexes. However, most of the constructed dynamic PINs just focus on the temporal dynamic information and thus overlook the spatial dynamic information of the complex biological systems. To address the limitation of the existing dynamic PIN analysis approaches, in this paper, we propose a new model-based scheme for the construction of the Spatial and Temporal Active Protein Interaction Network (ST-APIN) by integrating time-course gene expression data and subcellular location information. To evaluate the efficiency of ST-APIN, the commonly used classical clustering algorithm MCL is adopted to identify protein complexes from ST-APIN and the other three dynamic PINs, NF-APIN, DPIN, and TC-PIN. The experimental results show that, the performance of MCL on ST-APIN outperforms those on the other three dynamic PINs in terms of matching with known complexes, sensitivity, specificity, and f-measure. Furthermore, we evaluate the identified protein complexes by Gene Ontology (GO) function enrichment analysis. The validation shows that the identified protein complexes from ST-APIN are more biologically significant. This study provides a general paradigm for constructing the ST-APINs, which is essential for further understanding of molecular systems and the biomedical mechanism of complex diseases.
Collapse
|
27
|
Zhang Z, Luo Y, Hu S, Li X, Wang L, Zhao B. A novel method to predict essential proteins based on tensor and HITS algorithm. Hum Genomics 2020; 14:14. [PMID: 32252824 PMCID: PMC7137323 DOI: 10.1186/s40246-020-00263-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 03/05/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Essential proteins are an important part of the cell and closely related to the life activities of the cell. Hitherto, Protein-Protein Interaction (PPI) networks have been adopted by many computational methods to predict essential proteins. Most of the current approaches focus mainly on the topological structure of PPI networks. However, those methods relying solely on the PPI network have low detection accuracy for essential proteins. Therefore, it is necessary to integrate the PPI network with other biological information to identify essential proteins. RESULTS In this paper, we proposed a novel random walk method for identifying essential proteins, called HEPT. A three-dimensional tensor is constructed first by combining the PPI network of Saccharomyces cerevisiae with multiple biological data such as gene ontology annotations and protein domains. Then, based on the newly constructed tensor, we extended the Hyperlink-Induced Topic Search (HITS) algorithm from a two-dimensional to a three-dimensional tensor model that can be utilized to infer essential proteins. Different from existing state-of-the-art methods, the importance of proteins and the types of interactions will both contribute to the essential protein prediction. To evaluate the performance of our newly proposed HEPT method, proteins are ranked in the descending order based on their ranking scores computed by our method and other competitive methods. After that, a certain number of the ranked proteins are selected as candidates for essential proteins. According to the list of known essential proteins, the number of true essential proteins is used to judge the performance of each method. Experimental results show that our method can achieve better prediction performance in comparison with other nine state-of-the-art methods in identifying essential proteins. CONCLUSIONS Through analysis and experimental results, it is obvious that HEPT can be used to effectively improve the prediction accuracy of essential proteins by the use of HITS algorithm and the combination of network topology with gene ontology annotations and protein domains, which provides a new insight into multi-data source fusion.
Collapse
Affiliation(s)
- Zhihong Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
| | - Yingchun Luo
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
- Department of Ultrasound, Hunan Province Women and Children’s Hospital, Changsha, 410008 China
| | - Sai Hu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
| | - Xueyong Li
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
| | - Bihai Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
- Hunan Provincial Key Laboratory of Nutrition and Quality Control of Aquatic Animals, Department of Biological and Environmental Engineering, Changsha University, Changsha, 410022 China
| |
Collapse
|
28
|
Li HQ, Xu JY, Gao YY, Jin L, Chen JM, Chen FZ. Supramolecular structure, in vivo biological activities and molecular-docking-based potential cardiotoxic exploration of aconine hydrochloride monohydrate as a novel salt form. ACTA CRYSTALLOGRAPHICA SECTION B, STRUCTURAL SCIENCE, CRYSTAL ENGINEERING AND MATERIALS 2020; 76:208-224. [PMID: 32831223 DOI: 10.1107/s2052520620001250] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Accepted: 01/29/2020] [Indexed: 06/11/2023]
Abstract
Despite the high profile of aconine in WuTou injection, there has been no preparative technology or structural studies of its salt as the pharmaceutical product. The lack of any halide salt forms is surprising as aconine contains a tertiary nitrogen atom. In this work, aconine was prepared from the degradation of aconitine in Aconiti kusnezoffii radix (CaoWu). A green chemistry technique was applied to enrich the lipophilic-poor aconine. Reaction of aconine with hydrochloride acid resulted in protonation of the nitrogen atom and gave a novel salt form (C25H42NO9+·Cl-·H2O; aconine hydrochloride monohydrate, AHM), whose cation in the crystal structure was elucidated based on extensive spectroscopic and X-ray crystallographic analyses. The AHM crystal had a Z' = 3 structure with three independent cation-anion pairs, with profound conformational differences among the aconine cations. The central framework of each aconine cation was compared with that of previously reported aconitine, proving that protonation of the nitrogen atom induced the structure rearrangement. In the crystal of AHM, aconine cations, chloride anions and water molecules interacted through inter-species O-H...Cl and O-H...O hydrogen bonds; this complex hydrogen-bonding network stabilizes the supramolecular structure. The seriously disordered solvent molecules were treated using the PLATON SQUEEZE procedure [Spek (2015). Acta Cryst. C71, 9-18] and their atoms were therefore omitted from the refinement. Bioactivity studies indicated that AHM promoted in vitro proliferative activities of RAW264.7 cells. Molecular docking suggested AHM could target cardiotoxic protein through the hydrogen-bonding interactions. The structural confirmation of AHM offers a rational approach for improving the pharmaceutical technology of WuTou injection.
Collapse
Affiliation(s)
- Han Qing Li
- State Clinical Trial Institution of New Drugs, International Mongolian Hospital of Inner Mongolia, No. 83, Da Xue East Road, Sai Han District, Hohhot, Inner Mongolia 010065, People's Republic of China
| | - Jia Yin Xu
- Mongolian Pharmaceutical Preparation Center, International Mongolian Hospital of Inner Mongolia, Hohhot, Inner Mongolia 010065, People's Republic of China
| | - Yuan Yuan Gao
- State Clinical Trial Institution of New Drugs, International Mongolian Hospital of Inner Mongolia, No. 83, Da Xue East Road, Sai Han District, Hohhot, Inner Mongolia 010065, People's Republic of China
| | - Liang Jin
- State Clinical Trial Institution of New Drugs, International Mongolian Hospital of Inner Mongolia, No. 83, Da Xue East Road, Sai Han District, Hohhot, Inner Mongolia 010065, People's Republic of China
| | - Jian Ming Chen
- Department of Chemistry, Greenpure Biopharma Co., Ltd, Chengdu, Sichuan 614041, People's Republic of China
| | - Feng Zheng Chen
- Department of Chemistry, Leshan Normal University, Leshan, Sichuan 614004, People's Republic of China
| |
Collapse
|
29
|
Li X, Li W, Zeng M, Zheng R, Li M. Network-based methods for predicting essential genes or proteins: a survey. Brief Bioinform 2020; 21:566-583. [PMID: 30776072 DOI: 10.1093/bib/bbz017] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 01/21/2019] [Accepted: 01/22/2019] [Indexed: 01/03/2025] Open
Abstract
Genes that are thought to be critical for the survival of organisms or cells are called essential genes. The prediction of essential genes and their products (essential proteins) is of great value in exploring the mechanism of complex diseases, the study of the minimal required genome for living cells and the development of new drug targets. As laboratory methods are often complicated, costly and time-consuming, a great many of computational methods have been proposed to identify essential genes/proteins from the perspective of the network level with the in-depth understanding of network biology and the rapid development of biotechnologies. Through analyzing the topological characteristics of essential genes/proteins in protein-protein interaction networks (PINs), integrating biological information and considering the dynamic features of PINs, network-based methods have been proved to be effective in the identification of essential genes/proteins. In this paper, we survey the advanced methods for network-based prediction of essential genes/proteins and present the challenges and directions for future research.
Collapse
Affiliation(s)
- Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Wenkai Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| |
Collapse
|
30
|
Lei X, Yang X, Wu FX. Artificial Fish Swarm Optimization Based Method to Identify Essential Proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:495-505. [PMID: 30113899 DOI: 10.1109/tcbb.2018.2865567] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
It is well known that essential proteins play an extremely important role in controlling cellular activities in living organisms. Identifying essential proteins from protein protein interaction (PPI) networks is conducive to the understanding of cellular functions and molecular mechanisms. Hitherto, many essential proteins detection methods have been proposed. Nevertheless, those existing identification methods are not satisfactory because of low efficiency and low sensitivity to noisy data. This paper presents a novel computational approach based on artificial fish swarm optimization for essential proteins prediction in PPI networks (called AFSO_EP). In AFSO_EP, first, a part of known essential proteins are randomly chosen as artificial fishes of priori knowledge. Then, detecting essential proteins by imitating four principal biological behaviors of artificial fishes when searching for food or companions, including foraging behavior, following behavior, swarming behavior, and random behavior, in which process, the network topology, gene expression, gene ontology (GO) annotation, and subcellular localization information are utilized. To evaluate the performance of AFSO_EP, we conduct experiments on two species (Saccharomyces cerevisiae and Drosophila melanogaster), the experimental results show that our method AFSO_EP achieves a better performance for identifying essential proteins in comparison with several other well-known identification methods, which confirms the effectiveness of AFSO_EP.
Collapse
|
31
|
Liu C, Ma Y, Zhao J, Nussinov R, Zhang YC, Cheng F, Zhang ZK. Computational network biology: Data, models, and applications. PHYSICS REPORTS 2020; 846:1-66. [DOI: 10.1016/j.physrep.2019.12.004] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
32
|
Identification of important invasion and proliferation related genes in adrenocortical carcinoma. Med Oncol 2019; 36:73. [PMID: 31321566 DOI: 10.1007/s12032-019-1296-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 07/01/2019] [Indexed: 12/17/2022]
|
33
|
Li M, Ni P, Chen X, Wang J, Wu FX, Pan Y. Construction of Refined Protein Interaction Network for Predicting Essential Proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1386-1397. [PMID: 28186903 DOI: 10.1109/tcbb.2017.2665482] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Identification of essential proteins based on protein interaction network (PIN) is a very important and hot topic in the post genome era. Up to now, a number of network-based essential protein discovery methods have been proposed. Generally, a static protein interaction network was constructed by using the protein-protein interactions obtained from different experiments or databases. Unfortunately, most of the network-based essential protein discovery methods are sensitive to the reliability of the constructed PIN. In this paper, we propose a new method for constructing refined PIN by using gene expression profiles and subcellular location information. The basic idea behind refining the PIN is that two proteins should have higher possibility to physically interact with each other if they appear together at the same subcellular location and are active together at least at a time point in the cell cycle. The original static PIN is denoted by S-PIN while the final PIN refined by our method is denoted by TS-PIN. To evaluate whether the constructed TS-PIN is more suitable to be used in the identification of essential proteins, 10 network-based essential protein discovery methods (DC, EC, SC, BC, CC, IC, LAC, NC, BN, and DMNC) are applied on it to identify essential proteins. A comparison of TS-PIN and two other networks: S-PIN and NF-APIN (a noise-filtered active PIN constructed by using gene expression data and S-PIN) is implemented on the prediction of essential proteins by using these ten network-based methods. The comparison results show that all of the 10 network-based methods achieve better results when being applied on TS-PIN than that being applied on S-PIN and NF-APIN.
Collapse
|
34
|
Zhao B, Zhao Y, Zhang X, Zhang Z, Zhang F, Wang L. An iteration method for identifying yeast essential proteins from heterogeneous network. BMC Bioinformatics 2019; 20:355. [PMID: 31234779 PMCID: PMC6591974 DOI: 10.1186/s12859-019-2930-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 06/04/2019] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Essential proteins are distinctly important for an organism's survival and development and crucial to disease analysis and drug design as well. Large-scale protein-protein interaction (PPI) data sets exist in Saccharomyces cerevisiae, which provides us with a valuable opportunity to predict identify essential proteins from PPI networks. Many network topology-based computational methods have been designed to detect essential proteins. However, these methods are limited by the completeness of available PPI data. To break out of these restraints, some computational methods have been proposed by integrating PPI networks and multi-source biological data. Despite the progress in the research of multiple data fusion, it is still challenging to improve the prediction accuracy of the computational methods. RESULTS In this paper, we design a novel iterative model for essential proteins prediction, named Randomly Walking in the Heterogeneous Network (RWHN). In RWHN, a weighted protein-protein interaction network and a domain-domain association network are constructed according to the original PPI network and the known protein-domain association network, firstly. And then, we establish a new heterogeneous matrix by combining the two constructed networks with the protein-domain association network. Based on the heterogeneous matrix, a transition probability matrix is established by normalized operation. Finally, an improved PageRank algorithm is adopted on the heterogeneous network for essential proteins prediction. In order to eliminate the influence of the false negative, information on orthologous proteins and the subcellular localization information of proteins are integrated to initialize the score vector of proteins. In RWHN, the topology, conservative and functional features of essential proteins are all taken into account in the prediction process. The experimental results show that RWHN obviously exceeds in predicting essential proteins ten other competing methods. CONCLUSIONS We demonstrated that integrating multi-source data into a heterogeneous network can preserve the complex relationship among multiple biological data and improve the prediction accuracy of essential proteins. RWHN, our proposed method, is effective for the prediction of essential proteins.
Collapse
Affiliation(s)
- Bihai Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
- Hunan Provincial Key Laboratory of Nutrition and Quality Control of Aquatic Animals, Department of Biological and Environmental Engineering, Changsha University, Changsha, Hunan 410022 China
| | - Yulin Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
| | - Xiaoxia Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
| | - Zhihong Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
| | - Fan Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
- College of Information Engineering, Xiangtan University, Xiangtan, 411105 Hunan China
| |
Collapse
|
35
|
Rasti S, Vogiatzis C. A survey of computational methods in protein–protein interaction networks. ANNALS OF OPERATIONS RESEARCH 2019; 276:35-87. [DOI: 10.1007/s10479-018-2956-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
36
|
Lei X, Yang X, Fujita H. Random walk based method to identify essential proteins by integrating network topology and biological characteristics. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.01.012] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
37
|
Lei X, Wang S, Wu F. Identification of Essential Proteins Based on Improved HITS Algorithm. Genes (Basel) 2019; 10:E177. [PMID: 30823614 PMCID: PMC6409685 DOI: 10.3390/genes10020177] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 02/09/2019] [Accepted: 02/19/2019] [Indexed: 11/16/2022] Open
Abstract
Essential proteins are critical to the development and survival of cells. Identifying and analyzing essential proteins is vital to understand the molecular mechanisms of living cells and design new drugs. With the development of high-throughput technologies, many protein⁻protein interaction (PPI) data are available, which facilitates the studies of essential proteins at the network level. Up to now, although various computational methods have been proposed, the prediction precision still needs to be improved. In this paper, we propose a novel method by applying Hyperlink-Induced Topic Search (HITS) on weighted PPI networks to detect essential proteins, named HSEP. First, an original undirected PPI network is transformed into a bidirectional PPI network. Then, both biological information and network topological characteristics are taken into account to weighted PPI networks. Pieces of biological information include gene expression data, Gene Ontology (GO) annotation and subcellular localization. The edge clustering coefficient is represented as network topological characteristics to measure the closeness of two connected nodes. We conducted experiments on two species, namely Saccharomyces cerevisiae and Drosophila melanogaster, and the experimental results show that HSEP outperformed some state-of-the-art essential proteins detection techniques.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China.
| | - Siguo Wang
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China.
| | - Fangxiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada.
| |
Collapse
|
38
|
Ijaq J, Malik G, Kumar A, Das PS, Meena N, Bethi N, Sundararajan VS, Suravajhala P. A model to predict the function of hypothetical proteins through a nine-point classification scoring schema. BMC Bioinformatics 2019; 20:14. [PMID: 30621574 PMCID: PMC6325861 DOI: 10.1186/s12859-018-2554-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Accepted: 11/30/2018] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher sequence-structure-function relationship, especially in terms of functional modelling of the HPs have been developed by researchers, but using the features as classifiers for HPs has not been attempted. With the rise in number of annotation strategies, next-generation sequencing methods have provided further understanding the functions of HPs. RESULTS In our previous work, we developed a six-point classification scoring schema with annotation pertaining to protein family scores, orthology, protein interaction/association studies, bidirectional best BLAST hits, sorting signals, known databases and visualizers which were used to validate protein interactions. In this study, we introduced three more classifiers to our annotation system, viz. pseudogenes linked to HPs, homology modelling and non-coding RNAs associated to HPs. We discuss the challenges and performance of these classifiers using machine learning heuristics with an improved accuracy from Perceptron (81.08 to 97.67), Naive Bayes (54.05 to 96.67), Decision tree J48 (67.57 to 97.00), and SMO_npolyk (59.46 to 96.67). CONCLUSION With the introduction of three new classification features, the performance of the nine-point classification scoring schema has an improved accuracy to functionally annotate the HPs.
Collapse
Affiliation(s)
- Johny Ijaq
- Department of Biotechnology, Osmania University, Hyderabad, 500007 India
- Bioclues.org, Kukatpally, Hyderabad, 500072 India
| | - Girik Malik
- Department of Pediatrics, The Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, The Ohio State University, Columbus, OH USA
- Bioclues.org, Kukatpally, Hyderabad, 500072 India
- Labrynthe, New Delhi, India
| | - Anuj Kumar
- Bioclues.org, Kukatpally, Hyderabad, 500072 India
- Advanced Center for Computational and Applied Biotechnology, Uttarakhand Council for Biotechnology, Dehradun, 248007 India
| | - Partha Sarathi Das
- Bioclues.org, Kukatpally, Hyderabad, 500072 India
- Department of Microbiology, Bioinformatics Infrastructure Facility, Vidyasagar University, Midnapore, India
| | - Narendra Meena
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Statue Circle, RJ 302001 India
| | - Neeraja Bethi
- Department of Biotechnology, Osmania University, Hyderabad, 500007 India
| | | | - Prashanth Suravajhala
- Bioclues.org, Kukatpally, Hyderabad, 500072 India
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Statue Circle, RJ 302001 India
| |
Collapse
|
39
|
Elahi A, Babamir SM. Identification of essential proteins based on a new combination of topological and biological features in weighted protein-protein interaction networks. IET Syst Biol 2018; 12:247-257. [PMID: 30472688 PMCID: PMC8687241 DOI: 10.1049/iet-syb.2018.5024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 04/23/2018] [Accepted: 04/30/2018] [Indexed: 02/01/2023] Open
Abstract
The identification of essential proteins in protein-protein interaction (PPI) networks is not only important in understanding the process of cellular life but also useful in diagnosis and drug design. The network topology-based centrality measures are sensitive to noise of network. Moreover, these measures cannot detect low-connectivity essential proteins. The authors have proposed a new method using a combination of topological centrality measures and biological features based on statistical analyses of essential proteins and protein complexes. With incomplete PPI networks, they face the challenge of false-positive interactions. To remove these interactions, the PPI networks are weighted by gene ontology. Furthermore, they use a combination of classifiers, including the newly proposed measures and traditional weighted centrality measures, to improve the precision of identification. This combination is evaluated using the logistic regression model in terms of significance levels. The proposed method has been implemented and compared to both previous and more recent efficient computational methods using six statistical standards. The results show that the proposed method is more precise in identifying essential proteins than the previous methods. This level of precision was obtained through the use of four different data sets: YHQ-W, YMBD-W, YDIP-W and YMIPS-W.
Collapse
Affiliation(s)
- Abdolkarim Elahi
- Department of Software Engineering, University of Kashan, Kashan, Iran
| | | |
Collapse
|
40
|
Wang MY, Liang JW, Olounfeh KM, Sun Q, Zhao N, Meng FH. A Comprehensive In Silico Method to Study the QSTR of the Aconitine Alkaloids for Designing Novel Drugs. Molecules 2018; 23:E2385. [PMID: 30231506 PMCID: PMC6225272 DOI: 10.3390/molecules23092385] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 09/11/2018] [Accepted: 09/12/2018] [Indexed: 12/22/2022] Open
Abstract
A combined in silico method was developed to predict potential protein targets that are involved in cardiotoxicity induced by aconitine alkaloids and to study the quantitative structure⁻toxicity relationship (QSTR) of these compounds. For the prediction research, a Protein-Protein Interaction (PPI) network was built from the extraction of useful information about protein interactions connected with aconitine cardiotoxicity, based on nearly a decade of literature and the STRING database. The software Cytoscape and the PharmMapper server were utilized to screen for essential proteins in the constructed network. The Calcium-Calmodulin-Dependent Protein Kinase II alpha (CAMK2A) and gamma (CAMK2G) were identified as potential targets. To obtain a deeper insight on the relationship between the toxicity and the structure of aconitine alkaloids, the present study utilized QSAR models built in Sybyl software that possess internal robustness and external high predictions. The molecular dynamics simulation carried out here have demonstrated that aconitine alkaloids possess binding stability for the receptor CAMK2G. In conclusion, this comprehensive method will serve as a tool for following a structural modification of the aconitine alkaloids and lead to a better insight into the cardiotoxicity induced by the compounds that have similar structures to its derivatives.
Collapse
Affiliation(s)
- Ming-Yang Wang
- School of Pharmacy, China Medical University, Shenyang 110122, Liaoning, China.
| | - Jing-Wei Liang
- School of Pharmacy, China Medical University, Shenyang 110122, Liaoning, China.
| | | | - Qi Sun
- School of Pharmacy, China Medical University, Shenyang 110122, Liaoning, China.
| | - Nan Zhao
- School of Pharmacy, China Medical University, Shenyang 110122, Liaoning, China.
| | - Fan-Hao Meng
- School of Pharmacy, China Medical University, Shenyang 110122, Liaoning, China.
| |
Collapse
|
41
|
A systematic survey of centrality measures for protein-protein interaction networks. BMC SYSTEMS BIOLOGY 2018; 12:80. [PMID: 30064421 PMCID: PMC6069823 DOI: 10.1186/s12918-018-0598-2] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 06/22/2018] [Indexed: 12/12/2022]
Abstract
Background Numerous centrality measures have been introduced to identify “central” nodes in large networks. The availability of a wide range of measures for ranking influential nodes leaves the user to decide which measure may best suit the analysis of a given network. The choice of a suitable measure is furthermore complicated by the impact of the network topology on ranking influential nodes by centrality measures. To approach this problem systematically, we examined the centrality profile of nodes of yeast protein-protein interaction networks (PPINs) in order to detect which centrality measure is succeeding in predicting influential proteins. We studied how different topological network features are reflected in a large set of commonly used centrality measures. Results We used yeast PPINs to compare 27 common of centrality measures. The measures characterize and assort influential nodes of the networks. We applied principal component analysis (PCA) and hierarchical clustering and found that the most informative measures depend on the network’s topology. Interestingly, some measures had a high level of contribution in comparison to others in all PPINs, namely Latora closeness, Decay, Lin, Freeman closeness, Diffusion, Residual closeness and Average distance centralities. Conclusions The choice of a suitable set of centrality measures is crucial for inferring important functional properties of a network. We concluded that undertaking data reduction using unsupervised machine learning methods helps to choose appropriate variables (centrality measures). Hence, we proposed identifying the contribution proportions of the centrality measures with PCA as a prerequisite step of network analysis before inferring functional consequences, e.g., essentiality of a node. Electronic supplementary material The online version of this article (10.1186/s12918-018-0598-2) contains supplementary material, which is available to authorized users.
Collapse
|
42
|
Feature Selection via Swarm Intelligence for Determining Protein Essentiality. MOLECULES (BASEL, SWITZERLAND) 2018; 23:molecules23071569. [PMID: 29958434 PMCID: PMC6100311 DOI: 10.3390/molecules23071569] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 06/22/2018] [Accepted: 06/25/2018] [Indexed: 01/24/2023]
Abstract
Protein essentiality is fundamental to comprehend the function and evolution of genes. The prediction of protein essentiality is pivotal in identifying disease genes and potential drug targets. Since the experimental methods need many investments in time and funds, it is of great value to predict protein essentiality with high accuracy using computational methods. In this study, we present a novel feature selection named Elite Search mechanism-based Flower Pollination Algorithm (ESFPA) to determine protein essentiality. Unlike other protein essentiality prediction methods, ESFPA uses an improved swarm intelligence⁻based algorithm for feature selection and selects optimal features for protein essentiality prediction. The first step is to collect numerous features with the highly predictive characteristics of essentiality. The second step is to develop a feature selection strategy based on a swarm intelligence algorithm to obtain the optimal feature subset. Furthermore, an elite search mechanism is adopted to further improve the quality of feature subset. Subsequently a hybrid classifier is applied to evaluate the essentiality for each protein. Finally, the experimental results show that our method is competitive to some well-known feature selection methods. The proposed method aims to provide a new perspective for protein essentiality determination.
Collapse
|
43
|
Lei X, Yang X. A new method for predicting essential proteins based on participation degree in protein complex and subgraph density. PLoS One 2018; 13:e0198998. [PMID: 29894517 PMCID: PMC5997351 DOI: 10.1371/journal.pone.0198998] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2018] [Accepted: 05/30/2018] [Indexed: 12/11/2022] Open
Abstract
Essential proteins are crucial to living cells. Identification of essential proteins from protein-protein interaction (PPI) networks can be applied to pathway analysis and function prediction, furthermore, it can contribute to disease diagnosis and drug design. There have been some experimental and computational methods designed to identify essential proteins, however, the prediction precision remains to be improved. In this paper, we propose a new method for identifying essential proteins based on Participation degree of a protein in protein Complexes and Subgraph Density, named as PCSD. In order to test the performance of PCSD, four PPI datasets (DIP, Krogan, MIPS and Gavin) are used to conduct experiments. The experiment results have demonstrated that PCSD achieves a better performance for predicting essential proteins compared with some competing methods including DC, SC, EC, IC, LAC, NC, WDC, PeC, UDoNC, and compared with the most recent method LBCC, PCSD can correctly predict more essential proteins from certain numbers of top ranked proteins on the DIP dataset, which indicates that PCSD is very effective in discovering essential proteins in most case.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| | - Xiaoqin Yang
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| |
Collapse
|
44
|
Lei X, Fang M, Wu FX, Chen L. Improved flower pollination algorithm for identifying essential proteins. BMC SYSTEMS BIOLOGY 2018; 12:46. [PMID: 29745838 PMCID: PMC5998882 DOI: 10.1186/s12918-018-0573-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Background Essential proteins are necessary for the survival and development of cells. The identification of essential proteins can help to understand the minimal requirements for cellular life and it also plays an important role in the disease genes study and drug design. With the development of high-throughput techniques, a large amount of protein-protein interactions data is available to predict essential proteins at the network level. Hitherto, even though a number of essential protein discovery methods have been proposed, the prediction precision still needs to be improved. Methods In this paper, we propose a new algorithm, improved Flower Pollination algorithm (FPA) for identifying Essential proteins, named FPE. Different from other existing essential protein discovery methods, we apply FPA which is a new intelligent algorithm imitating pollination behavior of flowering plants in nature to identify essential proteins. Analogous to flower pollination is to find optimal reproduction from the perspective of biological evolution, and the identification of essential proteins is to discover a candidate essential protein set by analyzing the corresponding relationships between FPA algorithm and the prediction of essential proteins, and redefining the positions of flowers and specific pollination process. Moreover, it has been proved that the integration of biological and topological properties can get improved precision for identifying essential proteins. Consequently, we develop a GSC measurement in order to judge the essentiality of proteins, which takes into account not only the Gene expression data, Subcellular localization and protein Complexes information, but also the network topology. Results The experimental results show that FPE performs better than the state-of-the-art methods (DC, SC, IC, EC, LAC, NC, PeC, WDC, UDoNC and SON) in terms of the prediction precision, precision-recall curve and jackknife curve for identifying essential proteins and also has high stability. Conclusions We confirm that FPE can be used to effectively identify essential proteins by the use of nature-inspired algorithm FPA and the combination of network topology with gene expression data, subcellular localization and protein complexes information. The experimental results have shown the superiority of FPE for the prediction of essential proteins.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| | - Ming Fang
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, Canada
| | - Luonan Chen
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| |
Collapse
|
45
|
A Novel Entropy-Based Centrality Approach for Identifying Vital Nodes in Weighted Networks. ENTROPY 2018; 20:e20040261. [PMID: 33265352 PMCID: PMC7512776 DOI: 10.3390/e20040261] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Revised: 03/30/2018] [Accepted: 04/07/2018] [Indexed: 12/25/2022]
Abstract
Measuring centrality has recently attracted increasing attention, with algorithms ranging from those that simply calculate the number of immediate neighbors and the shortest paths to those that are complicated iterative refinement processes and objective dynamical approaches. Indeed, vital nodes identification allows us to understand the roles that different nodes play in the structure of a network. However, quantifying centrality in complex networks with various topological structures is not an easy task. In this paper, we introduce a novel definition of entropy-based centrality, which can be applicable to weighted directed networks. By design, the total power of a node is divided into two parts, including its local power and its indirect power. The local power can be obtained by integrating the structural entropy, which reveals the communication activity and popularity of each node, and the interaction frequency entropy, which indicates its accessibility. In addition, the process of influence propagation can be captured by the two-hop subnetworks, resulting in the indirect power. In order to evaluate the performance of the entropy-based centrality, we use four weighted real-world networks with various instance sizes, degree distributions, and densities. Correspondingly, these networks are adolescent health, Bible, United States (US) airports, and Hep-th, respectively. Extensive analytical results demonstrate that the entropy-based centrality outperforms degree centrality, betweenness centrality, closeness centrality, and the Eigenvector centrality.
Collapse
|
46
|
Li M, Li W, Wu FX, Pan Y, Wang J. Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information. J Theor Biol 2018; 447:65-73. [PMID: 29571709 DOI: 10.1016/j.jtbi.2018.03.029] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2017] [Revised: 03/19/2018] [Accepted: 03/20/2018] [Indexed: 01/07/2023]
Abstract
Essential proteins are important participants in various life activities and play a vital role in the survival and reproduction of living organisms. Identification of essential proteins from protein-protein interaction (PPI) networks has great significance to facilitate the study of human complex diseases, the design of drugs and the development of bioinformatics and computational science. Studies have shown that highly connected proteins in a PPI network tend to be essential. A series of computational methods have been proposed to identify essential proteins by analyzing topological structures of PPI networks. However, the high noise in the PPI data can degrade the accuracy of essential protein prediction. Moreover, proteins must be located in the appropriate subcellular localization to perform their functions, and only when the proteins are located in the same subcellular localization, it is possible that they can interact with each other. In this paper, we propose a new network-based essential protein discovery method based on sub-network partition and prioritization by integrating subcellular localization information, named SPP. The proposed method SPP was tested on two different yeast PPI networks obtained from DIP database and BioGRID database. The experimental results show that SPP can effectively reduce the effect of false positives in PPI networks and predict essential proteins more accurately compared with other existing computational methods DC, BC, CC, SC, EC, IC, NC.
Collapse
Affiliation(s)
- Min Li
- School of Information Science and Engineering, Central South University, Changsha 410083, China.
| | - Wenkai Li
- School of Information Science and Engineering, Central South University, Changsha 410083, China.
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada.
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA 30302-4110, USA.
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
47
|
Qin C, Sun Y, Dong Y. A new computational strategy for identifying essential proteins based on network topological properties and biological information. PLoS One 2017; 12:e0182031. [PMID: 28753682 PMCID: PMC5533339 DOI: 10.1371/journal.pone.0182031] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Accepted: 07/11/2017] [Indexed: 12/26/2022] Open
Abstract
Essential proteins are the proteins that are indispensable to the survival and development of an organism. Deleting a single essential protein will cause lethality or infertility. Identifying and analysing essential proteins are key to understanding the molecular mechanisms of living cells. There are two types of methods for predicting essential proteins: experimental methods, which require considerable time and resources, and computational methods, which overcome the shortcomings of experimental methods. However, the prediction accuracy of computational methods for essential proteins requires further improvement. In this paper, we propose a new computational strategy named CoTB for identifying essential proteins based on a combination of topological properties, subcellular localization information and orthologous protein information. First, we introduce several topological properties of the protein-protein interaction (PPI) network. Second, we propose new methods for measuring orthologous information and subcellular localization and a new computational strategy that uses a random forest prediction model to obtain a probability score for the proteins being essential. Finally, we conduct experiments on four different Saccharomyces cerevisiae datasets. The experimental results demonstrate that our strategy for identifying essential proteins outperforms traditional computational methods and the most recently developed method, SON. In particular, our strategy improves the prediction accuracy to 89, 78, 79, and 85 percent on the YDIP, YMIPS, YMBD and YHQ datasets at the top 100 level, respectively.
Collapse
Affiliation(s)
- Chao Qin
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
| | - Yongqi Sun
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
- * E-mail:
| | - Yadong Dong
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
| |
Collapse
|
48
|
Gao J, Song B, Ke W, Hu X. BalanceAli: Multiple PPI Network Alignment With Balanced High Coverage and Consistency. IEEE Trans Nanobioscience 2017; 16:333-340. [PMID: 28541215 DOI: 10.1109/tnb.2017.2705521] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Coverage and consistency are two most considered metrics to evaluate the effectiveness of network alignment. But they are a pair of contradictory evaluation metrics in protein-protein interaction (PPI) network alignment. It is difficult, if not impossible, to achieve high coverage and consistency simultaneously. Furthermore, existing methods of multiple PPI network alignment mostly ignore k-coverage or k-consistency, where k indicates the number of aligned species. In this paper, we propose BalanceAli, a novel approach for global alignment of multiple PPI networks that achieves high k-coverage and k-consistency simultaneously. With six data sets consisting of various numbers of PPI networks from five species, we evaluate the experimental results using different k values. The performance evaluations of our approach against other three state-of-the-art methods demonstrate the preferable comprehensive strength of our approach.
Collapse
|
49
|
Modos D, Brooks J, Fazekas D, Ari E, Vellai T, Csermely P, Korcsmaros T, Lenti K. Identification of critical paralog groups with indispensable roles in the regulation of signaling flow. Sci Rep 2016; 6:38588. [PMID: 27922122 PMCID: PMC5138592 DOI: 10.1038/srep38588] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Accepted: 11/11/2016] [Indexed: 01/21/2023] Open
Abstract
Extensive cross-talk between signaling pathways is required to integrate the myriad of extracellular signal combinations at the cellular level. Gene duplication events may lead to the emergence of novel functions, leaving groups of similar genes - termed paralogs - in the genome. To distinguish critical paralog groups (CPGs) from other paralogs in human signaling networks, we developed a signaling network-based method using cross-talk annotation and tissue-specific signaling flow analysis. 75 CPGs were found with higher degree, betweenness centrality, closeness, and ‘bowtieness’ when compared to other paralogs or other proteins in the signaling network. CPGs had higher diversity in all these measures, with more varied biological functions and more specific post-transcriptional regulation than non-critical paralog groups (non-CPG). Using TGF-beta, Notch and MAPK pathways as examples, SMAD2/3, NOTCH1/2/3 and MEK3/6-p38 CPGs were found to regulate the signaling flow of their respective pathways. Additionally, CPGs showed a higher mutation rate in both inherited diseases and cancer, and were enriched in drug targets. In conclusion, the results revealed two distinct types of paralog groups in the signaling network: CPGs and non-CPGs. Thus highlighting the importance of CPGs as compared to non-CPGs in drug discovery and disease pathogenesis.
Collapse
Affiliation(s)
- Dezso Modos
- Department of Morphology and Physiology, Faculty of Health Sciences, Semmelweis University, Budapest, Hungary.,Department of Genetics, Eotvos Lorand University, Budapest, Hungary.,Earlham Institute, Norwich Research Park, Norwich, UK
| | - Johanne Brooks
- Gut Health and Food Safety Programme, Institute of Food Research, Norwich Research Park, Norwich, UK.,Faculty of Medicine and Health, University of East Anglia, Norwich, UK.,Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK
| | - David Fazekas
- Department of Genetics, Eotvos Lorand University, Budapest, Hungary
| | - Eszter Ari
- Department of Genetics, Eotvos Lorand University, Budapest, Hungary.,Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre of the Hungarian Academy of Sciences, Szeged, Hungary
| | - Tibor Vellai
- Department of Genetics, Eotvos Lorand University, Budapest, Hungary
| | - Peter Csermely
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary
| | - Tamas Korcsmaros
- Department of Genetics, Eotvos Lorand University, Budapest, Hungary.,Earlham Institute, Norwich Research Park, Norwich, UK.,Gut Health and Food Safety Programme, Institute of Food Research, Norwich Research Park, Norwich, UK
| | - Katalin Lenti
- Department of Morphology and Physiology, Faculty of Health Sciences, Semmelweis University, Budapest, Hungary
| |
Collapse
|
50
|
Zhang W, Xu J, Li X, Zou X. A New Method for Identifying Essential Proteins by Measuring Co-Expression and Functional Similarity. IEEE Trans Nanobioscience 2016; 15:939-945. [DOI: 10.1109/tnb.2016.2625460] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|