1
|
Grassmann G, Miotto M, Desantis F, Di Rienzo L, Tartaglia GG, Pastore A, Ruocco G, Monti M, Milanetti E. Computational Approaches to Predict Protein-Protein Interactions in Crowded Cellular Environments. Chem Rev 2024; 124:3932-3977. [PMID: 38535831 PMCID: PMC11009965 DOI: 10.1021/acs.chemrev.3c00550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/20/2024] [Accepted: 02/21/2024] [Indexed: 04/11/2024]
Abstract
Investigating protein-protein interactions is crucial for understanding cellular biological processes because proteins often function within molecular complexes rather than in isolation. While experimental and computational methods have provided valuable insights into these interactions, they often overlook a critical factor: the crowded cellular environment. This environment significantly impacts protein behavior, including structural stability, diffusion, and ultimately the nature of binding. In this review, we discuss theoretical and computational approaches that allow the modeling of biological systems to guide and complement experiments and can thus significantly advance the investigation, and possibly the predictions, of protein-protein interactions in the crowded environment of cell cytoplasm. We explore topics such as statistical mechanics for lattice simulations, hydrodynamic interactions, diffusion processes in high-viscosity environments, and several methods based on molecular dynamics simulations. By synergistically leveraging methods from biophysics and computational biology, we review the state of the art of computational methods to study the impact of molecular crowding on protein-protein interactions and discuss its potential revolutionizing effects on the characterization of the human interactome.
Collapse
Affiliation(s)
- Greta Grassmann
- Department
of Biochemical Sciences “Alessandro Rossi Fanelli”, Sapienza University of Rome, Rome 00185, Italy
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Mattia Miotto
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Fausta Desantis
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- The
Open University Affiliated Research Centre at Istituto Italiano di
Tecnologia, Genoa 16163, Italy
| | - Lorenzo Di Rienzo
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Gian Gaetano Tartaglia
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
- Center
for Human Technologies, Genoa 16152, Italy
| | - Annalisa Pastore
- Experiment
Division, European Synchrotron Radiation
Facility, Grenoble 38043, France
| | - Giancarlo Ruocco
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| | - Michele Monti
- RNA
System Biology Lab, Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
| | - Edoardo Milanetti
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| |
Collapse
|
2
|
Pei F, Shi Q, Zhang H, Bahar I. Predicting Protein-Protein Interactions Using Symmetric Logistic Matrix Factorization. J Chem Inf Model 2021; 61:1670-1682. [PMID: 33831302 DOI: 10.1021/acs.jcim.1c00173] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Accurate assessment of protein-protein interactions (PPIs) is critical to deciphering disease mechanisms and developing novel drugs, and with rapidly growing PPI data, the need for more efficient predictive methods is emerging. We propose here a symmetric logistic matrix factorization (symLMF)-based approach to predict PPIs, especially useful for large PPI networks. Benchmarked against two widely used datasets (Saccharomyces cerevisiae and Homo sapiens benchmarks) and their extended versions, the symLMF-based method proves to outperform most of the state-of-the-art data-driven methods applied to human PPIs, and it shows a performance comparable to those of deep learning methods despite its conceptual and technical simplicity and efficiency. Tests performed on humans, yeast, and tissue (brain and liver)- and disease (neurodegenerative and metabolic disorders)-specific datasets further demonstrate the high capability to capture the hidden interactions. Notably, many "de novo predictions" made by symLMF are verified to exist in PPI databases other than those used for training/testing the method, indicating that the method could be of broad utility as a simple, yet efficient and accurate, tool applicable to PPI datasets.
Collapse
Affiliation(s)
| | - Qingya Shi
- School of Medicine, Tsinghua University, Beijing 100084, China
| | | | | |
Collapse
|
3
|
Göktepe YE, Kodaz H. Prediction of Protein-Protein Interactions Using An Effective Sequence Based Combined Method. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.03.062] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
4
|
Padmanabhan K, Shpanskaya K, Bello G, Doraiswamy PM, Samatova NF. Toward Personalized Network Biomarkers in Alzheimer's Disease: Computing Individualized Genomic and Protein Crosstalk Maps. Front Aging Neurosci 2017; 9:315. [PMID: 29085293 PMCID: PMC5649142 DOI: 10.3389/fnagi.2017.00315] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Accepted: 09/15/2017] [Indexed: 01/12/2023] Open
Affiliation(s)
- Kanchana Padmanabhan
- Department of Computer Science, North Carolina State University, Raleigh, NC, United States.,Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Katie Shpanskaya
- Stanford University School of Medicine, Stanford, CA, United States
| | - Gonzalo Bello
- Department of Computer Science, North Carolina State University, Raleigh, NC, United States
| | - P Murali Doraiswamy
- Department of Psychiatry, Duke University, Durham, NC, United States.,Duke Institute for Brain Sciences, Duke University, Durham, NC, United States
| | - Nagiza F Samatova
- Department of Computer Science, North Carolina State University, Raleigh, NC, United States.,Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| |
Collapse
|
5
|
Characterizing Gene and Protein Crosstalks in Subjects at Risk of Developing Alzheimer’s Disease: A New Computational Approach. Processes (Basel) 2017. [DOI: 10.3390/pr5030047] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
|
6
|
Wang L, You ZH, Chen X, Li JQ, Yan X, Zhang W, Huang YA. An ensemble approach for large-scale identification of protein- protein interactions using the alignments of multiple sequences. Oncotarget 2017; 8:5149-5159. [PMID: 28029645 PMCID: PMC5354898 DOI: 10.18632/oncotarget.14103] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 11/15/2016] [Indexed: 11/25/2022] Open
Abstract
Protein-Protein Interactions (PPI) is not only the critical component of various biological processes in cells, but also the key to understand the mechanisms leading to healthy and diseased states in organisms. However, it is time-consuming and cost-intensive to identify the interactions among proteins using biological experiments. Hence, how to develop a more efficient computational method rapidly became an attractive topic in the post-genomic era. In this paper, we propose a novel method for inference of protein-protein interactions from protein amino acids sequences only. Specifically, protein amino acids sequence is firstly transformed into Position-Specific Scoring Matrix (PSSM) generated by multiple sequences alignments; then the Pseudo PSSM is used to extract feature descriptors. Finally, ensemble Rotation Forest (RF) learning system is trained to predict and recognize PPIs based solely on protein sequence feature. When performed the proposed method on the three benchmark data sets (Yeast, H. pylori, and independent dataset) for predicting PPIs, our method can achieve good average accuracies of 98.38%, 89.75%, and 96.25%, respectively. In order to further evaluate the prediction performance, we also compare the proposed method with other methods using same benchmark data sets. The experiment results demonstrate that the proposed method consistently outperforms other state-of-the-art method. Therefore, our method is effective and robust and can be taken as a useful tool in exploring and discovering new relationships between proteins. A web server is made publicly available at the URL http://202.119.201.126:8888/PsePSSM/ for academic use.
Collapse
Affiliation(s)
- Lei Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - Xing Chen
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Xin Yan
- School of Foreign Languages, Zaozhuang University, Zaozhuang, Shandong 277100, China
| | - Wei Zhang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China
| | - Yu-An Huang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| |
Collapse
|
7
|
Liu X, Huang Y, Liang J, Zhang S, Li Y, Wang J, Shen Y, Xu Z, Zhao Y. Computational prediction of protein interactions related to the invasion of erythrocytes by malarial parasites. BMC Bioinformatics 2014; 15:393. [PMID: 25433733 PMCID: PMC4265449 DOI: 10.1186/s12859-014-0393-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Accepted: 11/19/2014] [Indexed: 11/10/2022] Open
Abstract
Background The invasion of red blood cells (RBCs) by malarial parasites is an essential step in the life cycle of Plasmodium falciparum. Human-parasite surface protein interactions play a critical role in this process. Although several interactions between human and parasite proteins have been discovered, the mechanism related to invasion remains poorly understood because numerous human-parasite protein interactions have not yet been identified. High-throughput screening experiments are not feasible for malarial parasites due to difficulty in expressing the parasite proteins. Here, we performed computational prediction of the PPIs involved in malaria parasite invasion to elucidate the mechanism by which invasion occurs. Results In this study, an expectation maximization algorithm was used to estimate the probabilities of domain-domain interactions (DDIs). Estimates of DDI probabilities were then used to infer PPI probabilities. We found that our prediction performance was better than that based on the information of D. melanogaster alone when information related to the six species was used. Prediction performance was assessed using protein interaction data from S. cerevisiae, indicating that the predicted results were reliable. We then used the estimates of DDI probabilities to infer interactions between 490 parasite and 3,787 human membrane proteins. A small-scale dataset was used to illustrate the usability of our method in predicting interactions between human and parasite proteins. The positive predictive value (PPV) was lower than that observed in S. cerevisiae. We integrated gene expression data to improve prediction accuracy and to reduce false positives. We identified 80 membrane proteins highly expressed in the schizont stage by fast Fourier transform method. Approximately 221 erythrocyte membrane proteins were identified using published mass spectral datasets. A network consisting of 205 interactions was predicted. Results of network analysis suggest that SNARE proteins of parasites and APP of humans may function in the invasion of RBCs by parasites. Conclusions We predicted a small-scale PPI network that may be involved in parasite invasion of RBCs by integrating DDI information and expression profiles. Experimental studies should be conducted to validate the predicted interactions. The predicted PPIs help elucidate the mechanism of parasite invasion and provide directions for future experimental investigations. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0393-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xuewu Liu
- Department of Pathogenic Biology, The Fourth Military Medical University, Xi'an, 710032, P. R. China.
| | - Yuxiao Huang
- Department of Pathogenic Biology, The Fourth Military Medical University, Xi'an, 710032, P. R. China.
| | - Jiao Liang
- Department of Pathogenic Biology, The Fourth Military Medical University, Xi'an, 710032, P. R. China.
| | - Shuai Zhang
- Department of Pathogenic Biology, The Fourth Military Medical University, Xi'an, 710032, P. R. China.
| | - Yinghui Li
- Department of Pathogenic Biology, The Fourth Military Medical University, Xi'an, 710032, P. R. China.
| | - Jun Wang
- Department of Pathogenic Biology, The Fourth Military Medical University, Xi'an, 710032, P. R. China.
| | - Yan Shen
- Department of Pathogenic Biology, The Fourth Military Medical University, Xi'an, 710032, P. R. China.
| | - Zhikai Xu
- Department of Pathogenic Biology, The Fourth Military Medical University, Xi'an, 710032, P. R. China.
| | - Ya Zhao
- Department of Pathogenic Biology, The Fourth Military Medical University, Xi'an, 710032, P. R. China.
| |
Collapse
|
8
|
Integration strategy is a key step in network-based analysis and dramatically affects network topological properties and inferring outcomes. BIOMED RESEARCH INTERNATIONAL 2014; 2014:296349. [PMID: 25243127 PMCID: PMC4163410 DOI: 10.1155/2014/296349] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Revised: 07/14/2014] [Accepted: 07/17/2014] [Indexed: 01/17/2023]
Abstract
An increasing number of experiments have been designed to detect intracellular and intercellular molecular interactions. Based on these molecular interactions (especially protein interactions), molecular networks have been built for using in several typical applications, such as the discovery of new disease genes and the identification of drug targets and molecular complexes. Because the data are incomplete and a considerable number of false-positive interactions exist, protein interactions from different sources are commonly integrated in network analyses to build a stable molecular network. Although various types of integration strategies are being applied in current studies, the topological properties of the networks from these different integration strategies, especially typical applications based on these network integration strategies, have not been rigorously evaluated. In this paper, systematic analyses were performed to evaluate 11 frequently used methods using two types of integration strategies: empirical and machine learning methods. The topological properties of the networks of these different integration strategies were found to significantly differ. Moreover, these networks were found to dramatically affect the outcomes of typical applications, such as disease gene predictions, drug target detections, and molecular complex identifications. The analysis presented in this paper could provide an important basis for future network-based biological researches.
Collapse
|
9
|
Hao D, Li C, Zhang S, Lu J, Jiang Y, Wang S, Zhou M. Network-based analysis of genotype-phenotype correlations between different inheritance modes. ACTA ACUST UNITED AC 2014; 30:3223-31. [PMID: 25078399 DOI: 10.1093/bioinformatics/btu482] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
MOTIVATION Recent studies on human disease have revealed that aberrant interaction between proteins probably underlies a substantial number of human genetic diseases. This suggests a need to investigate disease inheritance mode using interaction, and based on which to refresh our conceptual understanding of a series of properties regarding inheritance mode of human disease. RESULTS We observed a strong correlation between the number of protein interactions and the likelihood of a gene causing any dominant diseases or multiple dominant diseases, whereas no correlation was observed between protein interaction and the likelihood of a gene causing recessive diseases. We found that dominant diseases are more likely to be associated with disruption of important interactions. These suggest inheritance mode should be understood using protein interaction. We therefore reviewed the previous studies and refined an interaction model of inheritance mode, and then confirmed that this model is largely reasonable using new evidences. With these findings, we found that the inheritance mode of human genetic diseases can be predicted using protein interaction. By integrating the systems biology perspectives with the classical disease genetics paradigm, our study provides some new insights into genotype-phenotype correlations. CONTACT haodapeng@ems.hrbmu.edu.cn or biofomeng@hotmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dapeng Hao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, P.R. China and Institute for Systems Biology, Seattle 98109, USA
| | - Chuanxing Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, P.R. China and Institute for Systems Biology, Seattle 98109, USA College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, P.R. China and Institute for Systems Biology, Seattle 98109, USA
| | - Shaojun Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, P.R. China and Institute for Systems Biology, Seattle 98109, USA
| | - Jianping Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, P.R. China and Institute for Systems Biology, Seattle 98109, USA
| | - Yongshuai Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, P.R. China and Institute for Systems Biology, Seattle 98109, USA
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, P.R. China and Institute for Systems Biology, Seattle 98109, USA
| | - Meng Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, P.R. China and Institute for Systems Biology, Seattle 98109, USA
| |
Collapse
|
10
|
Du X, Cheng J, Zheng T, Duan Z, Qian F. A novel feature extraction scheme with ensemble coding for protein-protein interaction prediction. Int J Mol Sci 2014; 15:12731-49. [PMID: 25046746 PMCID: PMC4139871 DOI: 10.3390/ijms150712731] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Revised: 06/23/2014] [Accepted: 07/14/2014] [Indexed: 11/16/2022] Open
Abstract
Protein–protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/DXECPPI/index.jsp.
Collapse
Affiliation(s)
- Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, China.
| | - Jiaxing Cheng
- Institute of Information Engineering, Anhui Xinhua University, Hefei 230088, China.
| | - Tingting Zheng
- School of Mathematical Science, Anhui University, Hefei 230601, China.
| | - Zheng Duan
- School of Computer Science and Technology, Anhui University, Hefei 230601, China.
| | - Fulan Qian
- School of Computer Science and Technology, Anhui University, Hefei 230601, China.
| |
Collapse
|
11
|
Wu L, Zhou N, Sun R, Chen XD, Feng SC, Zhang B, Bao JK. Network-based identification of key proteins involved in apoptosis and cell cycle regulation. Cell Prolif 2014; 47:356-68. [PMID: 24889965 DOI: 10.1111/cpr.12113] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Accepted: 04/08/2014] [Indexed: 01/05/2023] Open
Abstract
OBJECTIVES Cancer cells differ from normal body cells in their ability to divide indefinitely and to evade programmed cell death. Crosstalk between apoptosis and cell cycle processes promotes balance between proliferation and death, and limits population growth and survival of cells. However, intricate relationships between them and how they are able to manipulate the fate of cancer cells still remain to be clarified. Identification of key factors involved in both apoptosis and cell cycle regulation may help to address this problem. MATERIALS AND METHODS Identification of such key proteins was carried out, using a series of bioinformatics methods, such as network construction and key protein identification. RESULTS In this study, we computationally constructed human apoptotic/cell cycle-related protein-protein interactions (PPIs) networks from five experimentally supported protein interaction databases, and further integrated these high-throughput data sets into a Naïve Bayesian model to predict protein functional connections. On the basis of modified apoptotic/cell cycle related PPI networks, we calculated and ranked all protein members involved in apoptosis and cell cycle regulation. Our results not only identified some already known key proteins such as p53, Rb, Myc and Src but also found that the proteasome, Cullin family members, kinases and transcriptional repressors play important roles in regulating apoptosis and the cell cycle. Furthermore, we found that the top 100 proteins ranked by PeC were enriched in some pathways such as those of cancer, the proteasome, the cell cycle and Wnt signalling. CONCLUSIONS We constructed the global human apoptotic/cell cycle related PPI network based on five online databases, and a Naïve Bayesian model. In addition, we systematically identified apoptotic/cell cycle related key proteins in cancer cells. These findings may uncover intricate relationships between apoptosis and cell cycle processes and thus provide further new clues towards future anticancer drug discovery.
Collapse
Affiliation(s)
- L Wu
- School of Life Sciences and Key Laboratory of Bio-resources and Eco-environment, Ministry of Education, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, 610064, China
| | | | | | | | | | | | | |
Collapse
|
12
|
Furlong LI. Human diseases through the lens of network biology. Trends Genet 2013; 29:150-9. [PMID: 23219555 DOI: 10.1016/j.tig.2012.11.004] [Citation(s) in RCA: 146] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Revised: 10/24/2012] [Accepted: 11/09/2012] [Indexed: 12/13/2022]
|
13
|
Aguiar-Pulido V, Munteanu CR, Seoane JA, Fernández-Blanco E, Pérez-Montoto LG, González-Díaz H, Dorado J. Naïve Bayes QSDR classification based on spiral-graph Shannon entropies for protein biomarkers in human colon cancer. MOLECULAR BIOSYSTEMS 2012; 8:1716-22. [PMID: 22466084 DOI: 10.1039/c2mb25039j] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Fast cancer diagnosis represents a real necessity in applied medicine due to the importance of this disease. Thus, theoretical models can help as prediction tools. Graph theory representation is one option because it permits us to numerically describe any real system such as the protein macromolecules by transforming real properties into molecular graph topological indices. This study proposes a new classification model for proteins linked with human colon cancer by using spiral graph topological indices of protein amino acid sequences. The best quantitative structure-disease relationship model is based on eleven Shannon entropy indices. It was obtained with the Naïve Bayes method and shows excellent predictive ability (90.92%) for new proteins linked with this type of cancer. The statistical analysis confirms that this model allows diagnosing the absence of human colon cancer obtaining an area under receiver operating characteristic of 0.91. The methodology presented can be used for any type of sequential information such as any protein and nucleic acid sequence.
Collapse
Affiliation(s)
- Vanessa Aguiar-Pulido
- Department of Information and Communications Technologies, University of A Coruña, A Coruña, Spain
| | | | | | | | | | | | | |
Collapse
|
14
|
Zhang S, Chang Z, Li Z, DuanMu H, Li Z, Li K, Liu Y, Qiu F, Xu Y. Calculating phenotypic similarity between genes using hierarchical structure data based on semantic similarity. Gene 2012; 497:58-65. [PMID: 22305981 DOI: 10.1016/j.gene.2012.01.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Revised: 01/16/2012] [Accepted: 01/18/2012] [Indexed: 01/25/2023]
Abstract
Phenotypic similarity is correlated with a number of measures of gene function, such as relatedness at the level of direct protein-protein interaction. The phenotypic effect of a deleted or mutated gene, which is one part of gene annotation, has caught broad attention. However, there have been few measures to study phenotypic similarity with the data from Human Phenotype Ontology (HPO) database, therefore more analogous measures should be developed and investigated. We used five semantic similarity-based measures (Jiang and Conrath, Lin, Schlicker, Yu and Wu) to calculate the human phenotypic similarity between genes (PSG) with data from HPO database, and evaluated their accuracy with information of protein-protein interaction, protein complex, protein family, gene function or DNA sequence. Compared with the gene pairs that were random selected, the results of these methods were statistically significant (all P<0.001). Furthermore, we assessed the performance of these five measures by receiver operating characteristic (ROC) curve analysis, and found that most of them performed better than the previous methods. This work had proved that these measures based on semantic similarity for calculation of PSG were effective for hierarchical structure data. Our study contributes to the development and optimization of novel algorithms of PSG calculation and provides more alternative methods to researchers as well as tools and directions for PSG study.
Collapse
Affiliation(s)
- Shanzhen Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, PR China
| | | | | | | | | | | | | | | | | |
Collapse
|