1
|
Meng X, Li W, Peng X, Li Y, Li M. Protein interaction networks: centrality, modularity, dynamics, and applications. FRONTIERS OF COMPUTER SCIENCE 2021; 15:156902. [DOI: 10.1007/s11704-020-8179-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 08/12/2020] [Indexed: 01/03/2025]
|
2
|
Luo P, Chen B, Liao B, Wu F. Predicting disease‐associated genes: Computational methods, databases, and evaluations. WIRES DATA MINING AND KNOWLEDGE DISCOVERY 2021; 11. [DOI: 10.1002/widm.1383] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Accepted: 06/13/2020] [Indexed: 09/09/2024]
Abstract
AbstractComplex diseases are associated with a set of genes (called disease genes), the identification of which can help scientists uncover the mechanisms of diseases and develop new drugs and treatment strategies. Due to the huge cost and time of experimental identification techniques, many computational algorithms have been proposed to predict disease genes. Although several review publications in recent years have discussed many computational methods, some of them focus on cancer driver genes while others focus on biomolecular networks, which only cover a specific aspect of existing methods. In this review, we summarize existing methods and classify them into three categories based on their rationales. Then, the algorithms, biological data, and evaluation methods used in the computational prediction are discussed. Finally, we highlight the limitations of existing methods and point out some future directions for improving these algorithms. This review could help investigators understand the principles of existing methods, and thus develop new methods to advance the computational prediction of disease genes.This article is categorized under:Technologies > Machine LearningTechnologies > PredictionAlgorithmic Development > Biological Data Mining
Collapse
Affiliation(s)
- Ping Luo
- Division of Biomedical Engineering University of Saskatchewan Saskatoon Canada
- Princess Margaret Cancer Centre University Health Network Toronto Canada
| | - Bolin Chen
- School of Computer Science and Technology Northwestern Polytechnical University China
| | - Bo Liao
- School of Mathematics and Statistics Hainan Normal University Haikou China
| | - Fang‐Xiang Wu
- Department of Mechanical Engineering and Department of Computer Science University of Saskatchewan Saskatoon Canada
| |
Collapse
|
3
|
Song J, Peng W, Wang F, Wang J. Identifying driver genes involving gene dysregulated expression, tissue-specific expression and gene-gene network. BMC Med Genomics 2019; 12:168. [PMID: 31888619 PMCID: PMC6936147 DOI: 10.1186/s12920-019-0619-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 11/11/2019] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Cancer as a kind of genomic alteration disease each year deprives many people's life. The biggest challenge to overcome cancer is to identify driver genes that promote the cancer development from a huge amount of passenger mutations that have no effect on the selective growth advantage of cancer. In order to solve those problems, some researchers have started to focus on identification of driver genes by integrating networks with other biological information. However, more efforts should be needed to improve the prediction performance. METHODS Considering the facts that driver genes have impact on expression of their downstream genes, they likely interact with each other to form functional modules and those modules should tend to be expressed similarly in the same tissue. We proposed a novel model named by DyTidriver to identify driver genes through involving the gene dysregulated expression, tissue-specific expression and variation frequency into the human functional interaction network (e.g. human FIN). RESULTS This method was applied on 974 breast, 316 prostate and 230 lung cancer patients. The consequence shows our method outperformed other five existing methods in terms of Fscore, Precision and Recall values. The enrichment and cociter analysis illustrate DyTidriver can not only identifies the driver genes enriched in some significant pathways but also has the capability to figure out some unknown driver genes. CONCLUSION The final results imply that driver genes are those that impact more dysregulated genes and express similarly in the same tissue.
Collapse
Affiliation(s)
- Junrong Song
- Faculty of Management and Economics/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, Yunnan, 650500, People's Republic of China
| | - Wei Peng
- Faculty of Management and Economics/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, Yunnan, 650500, People's Republic of China.
| | - Feng Wang
- Faculty of Management and Economics/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, Yunnan, 650500, People's Republic of China
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha, Hunan, 410083, People's Republic of China
| |
Collapse
|
4
|
Lei X, Fang Z. GBDTCDA: Predicting circRNA-disease Associations Based on Gradient Boosting Decision Tree with Multiple Biological Data Fusion. Int J Biol Sci 2019; 15:2911-2924. [PMID: 31853227 PMCID: PMC6909967 DOI: 10.7150/ijbs.33806] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2019] [Accepted: 10/15/2019] [Indexed: 12/17/2022] Open
Abstract
Circular RNA (circRNA) is a closed-loop structural non-coding RNA molecule which plays a significant role during the gene regulation processes. There are many previous studies shown that circRNAs can be regarded as the sponges of miRNAs. Thus, circRNA is also a key point for disease diagnosing, treating and inferring. However, traditional experimental approaches to verify the associations between the circRNA and disease are time-consuming and money-consuming. There are few computational models to predict potential circRNA-disease associations, which become our motivation to propose a new computational model. In this study, we propose a machine learning based computational model named Gradient Boosting Decision Tree with multiple biological data to predict circRNA-disease associations (GBDTCDA). The known circRNA-disease associations' data are downloaded from cricR2Disease database (http://bioinfo.snnu.edu.cn/CircR2Disease/). The feature vector of each circRNA-disease association pair is composed of four parts, which are the statistics information of different biological networks, the graph theory information of different biological networks, circRNA-disease associations' network information and circRNA nucleotide sequence information, respectively. Therefore, we use those feature vectors to train the gradient boosting decision tree regression model. Then, the leave one out cross validation (LOOCV) is adopted to evaluate the performance of our computational model. As for predicting some common diseases related circRNAs, our method GBDTCDA also obtains the better results. The Area under the ROC Curve (AUC) values of Basal cell carcinoma, Non-small cell lung cancer and cervical cancer are 95.8%, 88.3% and 93.5%, respectively. For further illustrating the performance of GBDTCDA, a case study of breast cancer is also supplemented in this study. Thus, our proposed method GBDTCDA is a powerful tool to predict potential circRNA-disease associations based on experimental results and analyses.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| | - Zengqiang Fang
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| |
Collapse
|
5
|
Lei X, Fang Z, Guo L. Predicting circRNA-Disease Associations Based on Improved Collaboration Filtering Recommendation System With Multiple Data. Front Genet 2019; 10:897. [PMID: 31608124 PMCID: PMC6773885 DOI: 10.3389/fgene.2019.00897] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 08/23/2019] [Indexed: 12/04/2022] Open
Abstract
With the development of high-throughput techniques, various biological molecules are discovered, which includes the circular RNAs (circRNAs). Circular RNA is a novel endogenous noncoding RNA that plays significant roles in regulating gene expression, moderating the microRNAs transcription as sponges, diagnosing diseases, and so on. Based on the circRNA particular molecular structures that are closed-loop structures with neither 5′-3′ polarities nor polyadenylated tails, circRNAs are more stable and conservative than the normal linear coding or noncoding RNAs, which makes circRNAs a biomarker of various diseases. Although some conventional experiments are used to identify the associations between circRNAs and diseases, almost the techniques and experiments are time-consuming and expensive. In this study, we propose a collaboration filtering recommendation system–based computational method, which handles the “cold start” problem to predict the potential circRNA–disease associations, which is named ICFCDA. All the known circRNA–disease associations data are downloaded from circR2Disease database (http://bioinfo.snnu.edu.cn/CircR2Disease/). Based on these data, multiple data are extracted from different databases to calculate the circRNA similarity networks and the disease similarity networks. The collaboration filtering recommendation system algorithm is first employed to predict circRNA–disease associations. Then, the leave-one-out cross validation mechanism is adopted to measure the performance of our proposed computational method. ICFCDA achieves the areas under the curve of 0.946, which is better than other existing methods. In order to further illustrate the performance of ICFCDA, case studies of some common diseases are made, and the results are confirmed by other databases. The experimental results show that ICFCDA is competent in predicting the circRNA–disease associations.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Zengqiang Fang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Ling Guo
- College of Life Sciences, Shaanxi Normal University, Xi'an, China
| |
Collapse
|
6
|
Sun S, Sun F, Wang Y. Multi-Level Comparative Framework Based on Gene Pair-Wise Expression Across Three Insulin Target Tissues for Type 2 Diabetes. Front Genet 2019; 10:252. [PMID: 30972105 PMCID: PMC6443994 DOI: 10.3389/fgene.2019.00252] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 03/06/2019] [Indexed: 11/30/2022] Open
Abstract
Type 2 diabetes (T2D) is known as a disease caused by gene alterations characterized by insulin resistance, thus the insulin-responsive tissues are of great interest for T2D study. It’s of great relevance to systematically investigate commonalities and specificities of T2D among those tissues. Here we establish a multi-level comparative framework across three insulin target tissues (white adipose, skeletal muscle, and liver) to provide a better understanding of T2D. Starting from the ranks of gene expression, we constructed the ‘disease network’ through detecting diverse interactions to provide a well-characterization for disease affected tissues. Then, we applied random walk with restart algorithm to the disease network to prioritize its nodes and edges according to their association with T2D. Finally, we identified a merged core module by combining the clustering coefficient and Jaccard index, which can provide elaborate and visible illumination of the common and specific features for different tissues at network level. Taken together, our network-, gene-, and module-level characterization across different tissues of T2D hold the promise to provide a broader and deeper understanding for T2D mechanism.
Collapse
Affiliation(s)
- Shaoyan Sun
- School of Mathematics and Statistics, Ludong University, Yantai, China
| | - Fengnan Sun
- Clinical Laboratory, Yantaishan Hospital, Yantai, China
| | - Yong Wang
- CEMS, NCMIS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
7
|
Sharma P, Bhattacharyya D, Kalita J. Detecting protein complexes based on a combination of topological and biological properties in protein-protein interaction network. J Genet Eng Biotechnol 2018; 16:217-226. [PMID: 30647725 PMCID: PMC6296571 DOI: 10.1016/j.jgeb.2017.11.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Revised: 11/01/2017] [Accepted: 11/17/2017] [Indexed: 01/04/2023]
Abstract
Protein complexes are known to play a major role in controlling cellular activity in a living being. Identifying complexes from raw protein protein interactions (PPIs) is an important area of research. Earlier work has been limited mostly to yeast. Such protein complex identification methods, when applied to large human PPIs often give poor performance. We introduce a novel method called CSC to detect protein complexes. The method is evaluated in terms of positive predictive value, sensitivity and accuracy using the datasets of the model organism, yeast and humans. CSC outperforms several other competing algorithms for both organisms. Further, we present a framework to establish the usefulness of CSC in analyzing the influence of a given disease gene in a complex topologically as well as biologically considering eight major association factors.
Collapse
Affiliation(s)
- Pooja Sharma
- Department of Computer Science & Engineering, Tezpur University Napaam, Tezpur 784028, Assam, India
| | - D.K. Bhattacharyya
- Department of Computer Science & Engineering, Tezpur University Napaam, Tezpur 784028, Assam, India
| | - J.K. Kalita
- Department of Computer Science, University of Colorado at Colorado, Springs, CO 80933-7150, USA
| |
Collapse
|
8
|
Abstract
Motivation Understanding functions of proteins in specific human tissues is essential for insights into disease diagnostics and therapeutics, yet prediction of tissue-specific cellular function remains a critical challenge for biomedicine. Results Here, we present OhmNet, a hierarchy-aware unsupervised node feature learning approach for multi-layer networks. We build a multi-layer network, where each layer represents molecular interactions in a different human tissue. OhmNet then automatically learns a mapping of proteins, represented as nodes, to a neural embedding-based low-dimensional space of features. OhmNet encourages sharing of similar features among proteins with similar network neighborhoods and among proteins activated in similar tissues. The algorithm generalizes prior work, which generally ignores relationships between tissues, by modeling tissue organization with a rich multiscale tissue hierarchy. We use OhmNet to study multicellular function in a multi-layer protein interaction network of 107 human tissues. In 48 tissues with known tissue-specific cellular functions, OhmNet provides more accurate predictions of cellular function than alternative approaches, and also generates more accurate hypotheses about tissue-specific protein actions. We show that taking into account the tissue hierarchy leads to improved predictive power. Remarkably, we also demonstrate that it is possible to leverage the tissue hierarchy in order to effectively transfer cellular functions to a functionally uncharacterized tissue. Overall, OhmNet moves from flat networks to multiscale models able to predict a range of phenotypes spanning cellular subsystems. Availability and implementation Source code and datasets are available at http://snap.stanford.edu/ohmnet.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA
| |
Collapse
|
9
|
Pedersen HK, Gudmundsdottir V, Brunak S. Pancreatic Islet Protein Complexes and Their Dysregulation in Type 2 Diabetes. Front Genet 2017; 8:43. [PMID: 28473845 PMCID: PMC5397424 DOI: 10.3389/fgene.2017.00043] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 03/27/2017] [Indexed: 12/18/2022] Open
Abstract
Type 2 diabetes (T2D) is a complex disease that involves multiple genes. Numerous risk loci have already been associated with T2D, although many susceptibility genes remain to be identified given heritability estimates. Systems biology approaches hold potential for discovering novel T2D genes by considering their biological context, such as tissue-specific protein interaction partners. Pancreatic islets are a key T2D tissue and many of the known genetic risk variants lead to impaired islet function, hence a better understanding of the islet-specific dysregulation in the disease-state is essential to unveil the full potential of person-specific profiles. Here we identify 3,692 overlapping pancreatic islet protein complexes (containing 10,805 genes) by integrating islet gene and protein expression data with protein interactions. We found 24 of these complexes to be significantly enriched for genes associated with diabetic phenotypes through heterogeneous evidence sources, including genetic variation, methylation, and gene expression in islets. The analysis specifically revealed ten T2D candidate genes with probable roles in islets (ANPEP, HADH, FAM105A, PDLIM4, PDLIM5, MAP2K4, PPP2R5E, SNX13, GNAS, and FRS2), of which the last six are novel in the context of T2D and the data that went into the analysis. Fifteen of the twenty-four complexes were further enriched for combined genetic associations with glycemic traits, exemplifying how perturbation of protein complexes by multiple small effects can give rise to diabetic phenotypes. The complex nature of T2D ultimately prompts an understanding of the individual patients at the network biology level. We present the foundation for such work by exposing a subset of the global interactome that is dysregulated in T2D and consequently provides a good starting point when evaluating an individual's alterations at the genome, transcriptome, or proteome level in relation to T2D in clinical settings.
Collapse
Affiliation(s)
- Helle Krogh Pedersen
- Department of Bio and Health Informatics, Technical University of DenmarkKgs Lyngby, Denmark
| | - Valborg Gudmundsdottir
- Department of Bio and Health Informatics, Technical University of DenmarkKgs Lyngby, Denmark
| | - Søren Brunak
- Department of Bio and Health Informatics, Technical University of DenmarkKgs Lyngby, Denmark
- Disease Systems Biology, Faculty of Health and Medical Sciences, Novo Nordisk Foundation Center for Protein Research, University of CopenhagenCopenhagen, Denmark
| |
Collapse
|
10
|
Abstract
Background Identifying the genes associated to human diseases is crucial for disease diagnosis and drug design. Computational approaches, esp. the network-based approaches, have been recently developed to identify disease-related genes effectively from the existing biomedical networks. Meanwhile, the advance in biotechnology enables researchers to produce multi-omics data, enriching our understanding on human diseases, and revealing the complex relationships between genes and diseases. However, none of the existing computational approaches is able to integrate the huge amount of omics data into a weighted integrated network and utilize it to enhance disease related gene discovery. Results We propose a new network-based disease gene prediction method called SLN-SRW (Simplified Laplacian Normalization-Supervised Random Walk) to generate and model the edge weights of a new biomedical network that integrates biomedical data from heterogeneous sources, thus far enhancing the disease related gene discovery. Conclusions The experiment results show that SLN-SRW significantly improves the performance of disease gene prediction on both the real and the synthetic data sets. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3263-4) contains supplementary material, which is available to authorized users.
Collapse
|
11
|
Zhang XF, Ou-Yang L, Dai DQ, Wu MY, Zhu Y, Yan H. Comparative analysis of housekeeping and tissue-specific driver nodes in human protein interaction networks. BMC Bioinformatics 2016; 17:358. [PMID: 27612563 PMCID: PMC5016887 DOI: 10.1186/s12859-016-1233-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2015] [Accepted: 08/31/2016] [Indexed: 12/31/2022] Open
Abstract
Background Several recent studies have used the Minimum Dominating Set (MDS) model to identify driver nodes, which provide the control of the underlying networks, in protein interaction networks. There may exist multiple MDS configurations in a given network, thus it is difficult to determine which one represents the real set of driver nodes. Because these previous studies only focus on static networks and ignore the contextual information on particular tissues, their findings could be insufficient or even be misleading. Results In this study, we develop a Collective-Influence-corrected Minimum Dominating Set (CI-MDS) model which takes into account the collective influence of proteins. By integrating molecular expression profiles and static protein interactions, 16 tissue-specific networks are established as well. We then apply the CI-MDS model to each tissue-specific network to detect MDS proteins. It generates almost the same MDSs when it is solved using different optimization algorithms. In addition, we classify MDS proteins into Tissue-Specific MDS (TS-MDS) proteins and HouseKeeping MDS (HK-MDS) proteins based on the number of tissues in which they are expressed and identified as MDS proteins. Notably, we find that TS-MDS proteins and HK-MDS proteins have significantly different topological and functional properties. HK-MDS proteins are more central in protein interaction networks, associated with more functions, evolving more slowly and subjected to a greater number of post-translational modifications than TS-MDS proteins. Unlike TS-MDS proteins, HK-MDS proteins significantly correspond to essential genes, ageing genes, virus-targeted proteins, transcription factors and protein kinases. Moreover, we find that besides HK-MDS proteins, many TS-MDS proteins are also linked to disease related genes, suggesting the tissue specificity of human diseases. Furthermore, functional enrichment analysis reveals that HK-MDS proteins carry out universally necessary biological processes and TS-MDS proteins usually involve in tissue-dependent functions. Conclusions Our study uncovers key features of TS-MDS proteins and HK-MDS proteins, and is a step forward towards a better understanding of the controllability of human interactomes. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1233-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Luoyu Road, Wuhan, 430079, China
| | - Le Ou-Yang
- College of Information Engineering, Shenzhen University, Nanhai Ave 3688, Shenzhen, 518060, China
| | - Dao-Qing Dai
- Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Xingang West Road, Guangzhou, 510275, China.
| | - Meng-Yun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Guoding Road, Shanghai, 200433, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Lumo Road, Wuhan, 430074, China
| | - Hong Yan
- Department of Electronic and Engineering, City University of Hong Kong, Tat Chee Avenue, Hong Kong, China
| |
Collapse
|
12
|
Deng Y, Gao L, Guo X, Wang B. Integrating phenotypic features and tissue-specific information to prioritize disease genes. SCIENCE CHINA INFORMATION SCIENCES 2016; 59:070101. [DOI: 10.1007/s11432-016-5584-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
13
|
Chen B, Li M, Wang J, Shang X, Wu FX. A fast and high performance multiple data integration algorithm for identifying human disease genes. BMC Med Genomics 2015; 8 Suppl 3:S2. [PMID: 26399620 PMCID: PMC4582601 DOI: 10.1186/1755-8794-8-s3-s2] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Integrating multiple data sources is indispensable in improving disease gene identification. It is not only due to the fact that disease genes associated with similar genetic diseases tend to lie close with each other in various biological networks, but also due to the fact that gene-disease associations are complex. Although various algorithms have been proposed to identify disease genes, their prediction performances and the computational time still should be further improved. RESULTS In this study, we propose a fast and high performance multiple data integration algorithm for identifying human disease genes. A posterior probability of each candidate gene associated with individual diseases is calculated by using a Bayesian analysis method and a binary logistic regression model. Two prior probability estimation strategies and two feature vector construction methods are developed to test the performance of the proposed algorithm. CONCLUSIONS The proposed algorithm is not only generated predictions with high AUC scores, but also runs very fast. When only a single PPI network is employed, the AUC score is 0.769 by using F2 as feature vectors. The average running time for each leave-one-out experiment is only around 1.5 seconds. When three biological networks are integrated, the AUC score using F3 as feature vectors increases to 0.830, and the average running time for each leave-one-out experiment takes only about 12.54 seconds. It is better than many existing algorithms.
Collapse
Affiliation(s)
- Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, 127 Youyi West Road, 710072, Xi'an, P.R. China
| | - Min Li
- School of Information Science and Engineering, Central South University, 410083, Changsha, P.R.China
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, 410083, Changsha, P.R.China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, 127 Youyi West Road, 710072, Xi'an, P.R. China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Dr., S7N 5A9, Saskatoon, Canada
- Department of Mechanical Engineering, University of Saskatchewan, 57 Campus Dr., S7N 5A9, Saskatoon, Canada
| |
Collapse
|
14
|
ProSim: A Method for Prioritizing Disease Genes Based on Protein Proximity and Disease Similarity. BIOMED RESEARCH INTERNATIONAL 2015; 2015:213750. [PMID: 26339594 PMCID: PMC4538409 DOI: 10.1155/2015/213750] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Accepted: 01/16/2015] [Indexed: 01/19/2023]
Abstract
Predicting disease genes for a particular genetic disease is very challenging in bioinformatics. Based on current research studies, this challenge can be tackled via network-based approaches. Furthermore, it has been highlighted that it is necessary to consider disease similarity along with the protein's proximity to disease genes in a protein-protein interaction (PPI) network in order to improve the accuracy of disease gene prioritization. In this study we propose a new algorithm called proximity disease similarity algorithm (ProSim), which takes both of the aforementioned properties into consideration, to prioritize disease genes. To illustrate the proposed algorithm, we have conducted six case studies, namely, prostate cancer, Alzheimer's disease, diabetes mellitus type 2, breast cancer, colorectal cancer, and lung cancer. We employed leave-one-out cross validation, mean enrichment, tenfold cross validation, and ROC curves to evaluate our proposed method and other existing methods. The results show that our proposed method outperforms existing methods such as PRINCE, RWR, and DADA.
Collapse
|
15
|
|
16
|
Ganegoda GU, Li M, Wang W, Feng Q. Heterogeneous Network Model to Infer Human Disease-Long Intergenic Non-Coding RNA Associations. IEEE Trans Nanobioscience 2015; 14:175-83. [DOI: 10.1109/tnb.2015.2391133] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
17
|
Prioritization of orphan disease-causing genes using topological feature and GO similarity between proteins in interaction networks. SCIENCE CHINA-LIFE SCIENCES 2014; 57:1064-71. [PMID: 25326068 DOI: 10.1007/s11427-014-4747-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 07/15/2014] [Indexed: 12/22/2022]
Abstract
Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies. However, it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments. With the advances of the high-throughput techniques, a large number of protein-protein interactions have been produced. Therefore, to address this issue, several methods based on protein interaction network have been proposed. In this paper, we propose a shortest path-based algorithm, named SPranker, to prioritize disease-causing genes in protein interaction networks. Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes, we further propose an improved algorithm SPGOranker by integrating the semantic similarity of GO annotations. SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account. The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches, ICN, VS and RWR. The experimental results show that SPranker and SPGOranker outperform ICN, VS, and RWR for the prioritization of orphan disease-causing genes. Importantly, for the case study of severe combined immunodeficiency, SPranker and SPGOranker predict several novel causal genes.
Collapse
|