1
|
Wu C, Lin B, Zhang H, Xu D, Gao R, Song R, Liu ZP, De Marinis Y. GCNPMDA: Human microbe-disease association prediction by hierarchical graph convolutional network with layer attention. Biomed Signal Process Control 2025; 100:107004. [DOI: 10.1016/j.bspc.2024.107004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
|
2
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
3
|
SCMFMDA: Predicting microRNA-disease associations based on similarity constrained matrix factorization. PLoS Comput Biol 2021; 17:e1009165. [PMID: 34252084 PMCID: PMC8345837 DOI: 10.1371/journal.pcbi.1009165] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 08/06/2021] [Accepted: 06/08/2021] [Indexed: 11/21/2022] Open
Abstract
miRNAs belong to small non-coding RNAs that are related to a number of complicated biological processes. Considerable studies have suggested that miRNAs are closely associated with many human diseases. In this study, we proposed a computational model based on Similarity Constrained Matrix Factorization for miRNA-Disease Association Prediction (SCMFMDA). In order to effectively combine different disease and miRNA similarity data, we applied similarity network fusion algorithm to obtain integrated disease similarity (composed of disease functional similarity, disease semantic similarity and disease Gaussian interaction profile kernel similarity) and integrated miRNA similarity (composed of miRNA functional similarity, miRNA sequence similarity and miRNA Gaussian interaction profile kernel similarity). In addition, the L2 regularization terms and similarity constraint terms were added to traditional Nonnegative Matrix Factorization algorithm to predict disease-related miRNAs. SCMFMDA achieved AUCs of 0.9675 and 0.9447 based on global Leave-one-out cross validation and five-fold cross validation, respectively. Furthermore, the case studies on two common human diseases were also implemented to demonstrate the prediction accuracy of SCMFMDA. The out of top 50 predicted miRNAs confirmed by experimental reports that indicated SCMFMDA was effective for prediction of relationship between miRNAs and diseases. Considerable studies have suggested that miRNAs are closely associated with many human diseases, so predicting potential associations between miRNAs and diseases can contribute to the diagnose and treatment of diseases. Several models of discovering unknown miRNA-diseases associations make the prediction more productive and effective. We proposed SCMFMDA to obtain more accuracy prediction result by applying similarity network fusion to fuse multi-source disease and miRNA information and utilizing similarity constrained matrix factorization to make prediction based on biological information. The global Leave-one-out cross validation and five-fold cross validation were applied to evaluate our model. Consequently, SCMFMDA could achieve AUCs of 0.9675 and 0.9447 that were obviously higher than previous computational models. Furthermore, we implemented case studies on significant human diseases including colon neoplasms and lung neoplasms, 47 and 46 of top-50 were confirmed by experimental reports. All results proved that SCMFMDA could be regard as an effective way to discover unverified connections of miRNA-disease.
Collapse
|
4
|
Peng W, Du J, Dai W, Lan W. Predicting miRNA-Disease Association Based on Modularity Preserving Heterogeneous Network Embedding. Front Cell Dev Biol 2021; 9:603758. [PMID: 34178973 PMCID: PMC8223753 DOI: 10.3389/fcell.2021.603758] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 03/23/2021] [Indexed: 12/12/2022] Open
Abstract
MicroRNAs (miRNAs) are a category of small non-coding RNAs that profoundly impact various biological processes related to human disease. Inferring the potential miRNA-disease associations benefits the study of human diseases, such as disease prevention, disease diagnosis, and drug development. In this work, we propose a novel heterogeneous network embedding-based method called MDN-NMTF (Module-based Dynamic Neighborhood Non-negative Matrix Tri-Factorization) for predicting miRNA-disease associations. MDN-NMTF constructs a heterogeneous network of disease similarity network, miRNA similarity network and a known miRNA-disease association network. After that, it learns the latent vector representation for miRNAs and diseases in the heterogeneous network. Finally, the association probability is computed by the product of the latent miRNA and disease vectors. MDN-NMTF not only successfully integrates diverse biological information of miRNAs and diseases to predict miRNA-disease associations, but also considers the module properties of miRNAs and diseases in the course of learning vector representation, which can maximally preserve the heterogeneous network structural information and the network properties. At the same time, we also extend MDN-NMTF to a new version (called MDN-NMTF2) by using modular information to improve the miRNA-disease association prediction ability. Our methods and the other four existing methods are applied to predict miRNA-disease associations in four databases. The prediction results show that our methods can improve the miRNA-disease association prediction to a high level compared with the four existing methods.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China.,Computer Technology Application Key Laboratory of Yunnan Province, Kunming University of Science and Technology, Kunming, China
| | - Jielin Du
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China.,Computer Technology Application Key Laboratory of Yunnan Province, Kunming University of Science and Technology, Kunming, China
| | - Wei Lan
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, China
| |
Collapse
|
5
|
Han Q, Yang Y, Wu S, Liao Y, Zhang S, Liang H, Cram DS, Zhang Y. Cruxome: a powerful tool for annotating, interpreting and reporting genetic variants. BMC Genomics 2021; 22:407. [PMID: 34082700 PMCID: PMC8173893 DOI: 10.1186/s12864-021-07728-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 05/20/2021] [Indexed: 01/23/2023] Open
Abstract
Background Next-generation sequencing (NGS) is an efficient tool used for identifying pathogenic variants that cause Mendelian disorders. However, the lack of bioinformatics training of researchers makes the interpretation of identified variants a challenge in terms of precision and efficiency. In addition, the non-standardized phenotypic description of human diseases also makes it difficult to establish an integrated analysis pathway for variant annotation and interpretation. Solutions to these bottlenecks are urgently needed. Results We develop a tool named “Cruxome” to automatically annotate and interpret single nucleotide variants (SNVs) and small insertions and deletions (InDels). Our approach greatly simplifies the current burdensome task of clinical geneticists and scientists to identify the causative pathogenic variants and build personal knowledge reference bases. The integrated architecture of Cruxome offers key advantages such as an interactive and user-friendly interface and the assimilation of electronic health records of the patient. By combining a natural language processing algorithm, Cruxome can efficiently process the clinical description of diseases to HPO standardized vocabularies. By using machine learning, in silico predictive algorithms, integrated multiple databases and supplementary tools, Cruxome can automatically process SNVs and InDels variants (trio-family or proband-only cases) and clinical diagnosis records, then annotate, score, identify and interpret pathogenic variants to finally generate a standardized clinical report following American College of Medical Genetics and Genomics/ Association for Molecular Pathology (ACMG/AMP) guidelines. Cruxome also provides supplementary tools to examine and visualize the genes or variations in historical cases, which can help to better understand the genetic basis of the disease. Conclusions Cruxome is an efficient tool for annotation and interpretation of variations and dramatically reduces the workload for clinical geneticists and researchers to interpret NGS results, simplifying their decision-making processes. We present an online version of Cruxome, which is freely available to academics and clinical researchers. The site is accessible at http://114.251.61.49:10024/cruxome/. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07728-6.
Collapse
Affiliation(s)
- Qingmei Han
- Berry Genomics Company Limited, Building 5, Courtyard 4, Shengmingyuan Road, ZGC Life Science Park, Changping District, 102200, Beijing, China
| | - Ying Yang
- Xian Children's Hospital, 710003, Xian, China
| | - Shengyang Wu
- Berry Genomics Company Limited, Building 5, Courtyard 4, Shengmingyuan Road, ZGC Life Science Park, Changping District, 102200, Beijing, China
| | - Yingchun Liao
- Berry Genomics Company Limited, Building 5, Courtyard 4, Shengmingyuan Road, ZGC Life Science Park, Changping District, 102200, Beijing, China
| | - Shuang Zhang
- Berry Genomics Company Limited, Building 5, Courtyard 4, Shengmingyuan Road, ZGC Life Science Park, Changping District, 102200, Beijing, China
| | - Hongbin Liang
- Berry Genomics Company Limited, Building 5, Courtyard 4, Shengmingyuan Road, ZGC Life Science Park, Changping District, 102200, Beijing, China
| | - David S Cram
- Berry Genomics Company Limited, Building 5, Courtyard 4, Shengmingyuan Road, ZGC Life Science Park, Changping District, 102200, Beijing, China.
| | - Yu Zhang
- Berry Genomics Company Limited, Building 5, Courtyard 4, Shengmingyuan Road, ZGC Life Science Park, Changping District, 102200, Beijing, China.
| |
Collapse
|
6
|
Wang H, Tang J, Ding Y, Guo F. Exploring associations of non-coding RNAs in human diseases via three-matrix factorization with hypergraph-regular terms on center kernel alignment. Brief Bioinform 2021; 22:6095847. [PMID: 33443536 DOI: 10.1093/bib/bbaa409] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 11/05/2020] [Accepted: 12/11/2020] [Indexed: 12/25/2022] Open
Abstract
Relationship of accurate associations between non-coding RNAs and diseases could be of great help in the treatment of human biomedical research. However, the traditional technology is only applied on one type of non-coding RNA or a specific disease, and the experimental method is time-consuming and expensive. More computational tools have been proposed to detect new associations based on known ncRNA and disease information. Due to the ncRNAs (circRNAs, miRNAs and lncRNAs) having a close relationship with the progression of various human diseases, it is critical for developing effective computational predictors for ncRNA-disease association prediction. In this paper, we propose a new computational method of three-matrix factorization with hypergraph regularization terms (HGRTMF) based on central kernel alignment (CKA), for identifying general ncRNA-disease associations. In the process of constructing the similarity matrix, various types of similarity matrices are applicable to circRNAs, miRNAs and lncRNAs. Our method achieves excellent performance on five datasets, involving three types of ncRNAs. In the test, we obtain best area under the curve scores of $0.9832$, $0.9775$, $0.9023$, $0.8809$ and $0.9185$ via 5-fold cross-validation and $0.9832$, $0.9836$, $0.9198$, $0.9459$ and $0.9275$ via leave-one-out cross-validation on five datasets. Furthermore, our novel method (CKA-HGRTMF) is also able to discover new associations between ncRNAs and diseases accurately. Availability: Codes and data are available: https://github.com/hzwh6910/ncRNA2Disease.git. Contact: fguo@tju.edu.cn.
Collapse
Affiliation(s)
- Hao Wang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,School of Computational Science and Engineering, University of South Carolina, Columbia, U.S
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
7
|
Ni P, Wang J, Zhong P, Li Y, Wu FX, Pan Y. Constructing Disease Similarity Networks Based on Disease Module Theory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:906-915. [PMID: 29993782 DOI: 10.1109/tcbb.2018.2817624] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Quantifying the associations between diseases is now playing an important role in modern biology and medicine. Actually discovering associations between diseases could help us gain deeper insights into pathogenic mechanisms of complex diseases, thus could lead to improvements in disease diagnosis, drug repositioning, and drug development. Due to the growing body of high-throughput biological data, a number of methods have been developed for computing similarity between diseases during the past decade. However, these methods rarely consider the interconnections of genes related to each disease in protein-protein interaction network (PPIN). Recently, the disease module theory has been proposed, which states that disease-related genes or proteins tend to interact with each other in the same neighborhood of a PPIN. In this study, we propose a new method called ModuleSim to measure associations between diseases by using disease-gene association data and PPIN data based on disease module theory. The experimental results show that by considering the interactions between disease modules and their modularity, the disease similarity calculated by ModuleSim has a significant correlation with disease classification of Disease Ontology (DO). Furthermore, ModuleSim outperforms other four popular methods which are all using disease-gene association data and PPIN data to measure disease-disease associations. In addition, the disease similarity network constructed by MoudleSim suggests that ModuleSim is capable of finding potential associations between diseases.
Collapse
|
8
|
Cheng L, Zhao H, Wang P, Zhou W, Luo M, Li T, Han J, Liu S, Jiang Q. Computational Methods for Identifying Similar Diseases. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:590-604. [PMID: 31678735 PMCID: PMC6838934 DOI: 10.1016/j.omtn.2019.09.019] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/11/2019] [Accepted: 09/12/2019] [Indexed: 02/01/2023]
Abstract
Although our knowledge of human diseases has increased dramatically, the molecular basis, phenotypic traits, and therapeutic targets of most diseases still remain unclear. An increasing number of studies have observed that similar diseases often are caused by similar molecules, can be diagnosed by similar markers or phenotypes, or can be cured by similar drugs. Thus, the identification of diseases similar to known ones has attracted considerable attention worldwide. To this end, the associations between diseases at the molecular, phenotypic, and taxonomic levels were used to measure the pairwise similarity in diseases. The corresponding performance assessment strategies for these methods involving the terms “category-based,” “simulated-patient-based,” and “benchmark-data-based” were thus further emphasized. Then, frequently used methods were evaluated using a benchmark-data-based strategy. To facilitate the assessment of disease similarity scores, researchers have designed dozens of tools that implement these methods for calculating disease similarity. Currently, disease similarity has been advantageous in predicting noncoding RNA (ncRNA) function and therapeutic drugs for diseases. In this article, we review disease similarity methods, evaluation strategies, tools, and their applications in the biomedical community. We further evaluate the performance of these methods and discuss the current limitations and future trends for calculating disease similarity.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hengqiang Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Tianxin Li
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Shulin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, Heilongjiang, China; Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, AB, Canada.
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| |
Collapse
|
9
|
Cheng L, Hu Y, Sun J, Zhou M, Jiang Q. DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics 2019; 34:1953-1956. [PMID: 29365045 DOI: 10.1093/bioinformatics/bty002] [Citation(s) in RCA: 182] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 01/22/2018] [Indexed: 01/09/2023] Open
Abstract
Summary DincRNA aims to provide a comprehensive web-based bioinformatics toolkit to elucidate the entangled relationships among diseases and non-coding RNAs (ncRNAs) from the perspective of disease similarity. The quantitative way to illustrate relationships of pair-wise diseases always depends on their molecular mechanisms, and structures of the directed acyclic graph of Disease Ontology (DO). Corresponding methods for calculating similarity of pair-wise diseases involve Resnik's, Lin's, Wang's, PSB and SemFunSim methods. Recently, disease similarity was validated suitable for calculating functional similarities of ncRNAs and prioritizing ncRNA-disease pairs, and it has been widely applied for predicting the ncRNA function due to the limited biological knowledge from wet lab experiments of these RNAs. For this purpose, a large number of algorithms and priori knowledge need to be integrated. e.g. 'pair-wise best, pairs-average' (PBPA) and 'pair-wise all, pairs-maximum' (PAPM) methods for calculating functional similarities of ncRNAs, and random walk with restart (RWR) method for prioritizing ncRNA-disease pairs. To facilitate the exploration of disease associations and ncRNA function, DincRNA implemented all of the above eight algorithms based on DO and disease-related genes. Currently, it provides the function to query disease similarity scores, miRNA and lncRNA functional similarity scores, and the prioritization scores of lncRNA-disease and miRNA-disease pairs. Availability and implementation http://bio-annotation.cn:18080/DincRNAClient/. Contact biofomeng@hotmail.com or qhjiang@hit.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Liang Cheng
- Department of College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Sheng 150081, China
| | - Yang Hu
- Department of Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang Sheng 150001, China
| | - Jie Sun
- Department of College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Sheng 150081, China
| | - Meng Zhou
- Department of College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Sheng 150081, China
| | - Qinghua Jiang
- Department of Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang Sheng 150001, China
| |
Collapse
|
10
|
Cheng L, Yang H, Zhao H, Pei X, Shi H, Sun J, Zhang Y, Wang Z, Zhou M. MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief Bioinform 2019; 20:203-209. [PMID: 28968812 DOI: 10.1093/bib/bbx103] [Citation(s) in RCA: 87] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Indexed: 12/18/2022] Open
Abstract
Complex diseases cannot be understood only on the basis of single gene, single mRNA transcript or single protein but the effect of their collaborations. The combination consequence in molecular level can be captured by the alterations of metabolites. With the rapidly developing of biomedical instruments and analytical platforms, a large number of metabolite signatures of complex diseases were identified and documented in the literature. Biologists' hardship in the face of this large amount of papers recorded metabolic signatures of experiments' results calls for an automated data repository. Therefore, we developed MetSigDis aiming to provide a comprehensive resource of metabolite alterations in various diseases. MetSigDis is freely available at http://www.bio-annotation.cn/MetSigDis/. By reviewing hundreds of publications, we collected 6849 curated relationships between 2420 metabolites and 129 diseases across eight species involving Homo sapiens and model organisms. All of these relationships were used in constructing a metabolite disease network (MDN). This network displayed scale-free characteristics according to the degree distribution (power-law distribution with R2 = 0.909), and the subnetwork of MDN for interesting diseases and their related metabolites can be visualized in the Web. The common alterations of metabolites reflect the metabolic similarity of diseases, which is measured using Jaccard index. We observed that metabolite-based similar diseases are inclined to share semantic associations of Disease Ontology. A human disease network was then built, where a node represents a disease, and an edge indicates similarity of pair-wise diseases. The network validated the observation that linked diseases based on metabolites should have more overlapped genes.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Haixiu Yang
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Hengqiang Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Xiaoya Pei
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Hongbo Shi
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Jie Sun
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Yunpeng Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Zhenzhen Wang
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Meng Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University
| |
Collapse
|
11
|
Tang Y, Chen K, Wu X, Wei Z, Zhang SY, Song B, Zhang SW, Huang Y, Meng J. DRUM: Inference of Disease-Associated m 6A RNA Methylation Sites From a Multi-Layer Heterogeneous Network. Front Genet 2019; 10:266. [PMID: 31001320 PMCID: PMC6456716 DOI: 10.3389/fgene.2019.00266] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 03/11/2019] [Indexed: 01/27/2023] Open
Abstract
Recent studies have revealed that the RNA N 6-methyladenosine (m6A) modification plays a critical role in a variety of biological processes and associated with multiple diseases including cancers. Till this day, transcriptome-wide m6A RNA methylation sites have been identified by high-throughput sequencing technique combined with computational methods, and the information is publicly available in a few bioinformatics databases; however, the association between individual m6A sites and various diseases are still largely unknown. There are yet computational approaches developed for investigating potential association between individual m6A sites and diseases, which represents a major challenge in the epitranscriptome analysis. Thus, to infer the disease-related m6A sites, we implemented a novel multi-layer heterogeneous network-based approach, which incorporates the associations among diseases, genes and m6A RNA methylation sites from gene expression, RNA methylation and disease similarities data with the Random Walk with Restart (RWR) algorithm. To evaluate the performance of the proposed approach, a ten-fold cross validation is performed, in which our approach achieved a reasonable good performance (overall AUC: 0.827, average AUC 0.867), higher than a hypergeometric test-based approach (overall AUC: 0.7333 and average AUC: 0.723) and a random predictor (overall AUC: 0.550 and average AUC: 0.486). Additionally, we show that a number of predicted cancer-associated m6A sites are supported by existing literatures, suggesting that the proposed approach can effectively uncover the underlying epitranscriptome circuits of disease mechanisms. An online database DRUM, which stands for disease-associated ribonucleic acid methylation, was built to support the query of disease-associated RNA m6A methylation sites, and is freely available at: www.xjtlu.edu.cn/biologicalsciences/drum.
Collapse
Affiliation(s)
- Yujiao Tang
- Department of Biological Sciences, Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Kunqi Chen
- Department of Biological Sciences, Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of & Chronic Disease, University of Liverpool, Liverpool, United Kingdom
| | - Xiangyu Wu
- Department of Biological Sciences, Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of & Chronic Disease, University of Liverpool, Liverpool, United Kingdom
| | - Zhen Wei
- Department of Biological Sciences, Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of & Chronic Disease, University of Liverpool, Liverpool, United Kingdom
| | - Song-Yao Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, China
| | - Bowen Song
- Department of Biological Sciences, Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of & Chronic Disease, University of Liverpool, Liverpool, United Kingdom
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, China
| | - Yufei Huang
- Department of Epidemiology and Biostatistics, University of Texas Health San Antonio, San Antonio, TX, United States
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, United States
| | - Jia Meng
- Department of Biological Sciences, Research Center for Precision Medicine, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of & Chronic Disease, University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
12
|
Yan C, Wang J, Ni P, Lan W, Wu FX, Pan Y. DNRLMF-MDA:Predicting microRNA-Disease Associations Based on Similarities of microRNAs and Diseases. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:233-243. [PMID: 29990253 DOI: 10.1109/tcbb.2017.2776101] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
MicroRNAs (miRNAs) are a class of non-coding RNAs about ∼ 22nt nucleotides. Studies have proven that miRNAs play key roles in many human complex diseases. Therefore, discovering miRNA-disease associations is beneficial to understanding disease mechanisms, developing drugs, and treating complex diseases. It is well known that it is a time-consuming and expensive process to discover the miRNA-disease associations via biological experiments. Alternatively, computational models could provide a low-cost and high-efficiency way for predicting miRNA-disease associations. In this study, we propose a method (called DNRLMF-MDA) to predict miRNA-disease associations based on dynamic neighborhood regularized logistic matrix factorization. DNRLMF-MDA integrates known miRNA-disease associations, functional similarity and Gaussian Interaction Profile (GIP) kernel similarity of miRNAs, and functional similarity and GIP kernel similarity of diseases. Especially, positive observations (known miRNA-disease associations) are assigned higher importance levels than negative observations (unknown miRNA-disease associations).DNRLMF-MDA computes the probability that a miRNA would interact with a disease by a logistic matrix factorization method, where latent vectors of miRNAs and diseases represent the properties of miRNAs and diseases, respectively, and further improve prediction performance via dynamic neighborhood regularized. The 5-fold cross validation is adopted to assess the performance of our DNRLMF-MDA, as well as other competing methods for comparison. The computational experiments show that DNRLMF-MDA outperforms the state-of-art method PBMDA. The AUC values of DNRLMF-MDA on three datasets are 0.9357, 0.9411, and 0.9416, respectively, which are superior to the PBMDA's results of 0.9218, 0.9187, and 0.9262. The average computation times per 5-fold cross validation of DNRLMF-MDA on three datasets are 38, 46, and 50 seconds, which are shorter than the PBMDA's average computation times of 10869, 916, and 8448 seconds, respectively. DNRLMF-MDA also can predict potential diseases for new miRNAs. Furthermore, case studies illustrate that DNRLMF-MDA is an effective method to predict miRNA-disease associations.
Collapse
|
13
|
Jiang L, Xiao Y, Ding Y, Tang J, Guo F. FKL-Spa-LapRLS: an accurate method for identifying human microRNA-disease association. BMC Genomics 2018; 19:911. [PMID: 30598109 PMCID: PMC6311941 DOI: 10.1186/s12864-018-5273-x] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the process of post-transcription, microRNAs (miRNAs) are closely related to various complex human diseases. Traditional verification methods for miRNA-disease associations take a lot of time and expense, so it is especially important to design computational methods for detecting potential associations. Considering the restrictions of previous computational methods for predicting potential miRNAs-disease associations, we develop the model of FKL-Spa-LapRLS (Fast Kernel Learning Sparse kernel Laplacian Regularized Least Squares) to break through the limitations. RESULT First, we extract three miRNA similarity kernels and three disease similarity kernels. Then, we combine these kernels into a single kernel through the Fast Kernel Learning (FKL) model, and use sparse kernel (Spa) to eliminate noise in the integrated similarity kernel. Finally, we find the associations via Laplacian Regularized Least Squares (LapRLS). Based on three evaluation methods, global and local leave-one-out cross validation (LOOCV), and 5-fold cross validation, the AUCs of our method achieve 0.9563, 0.8398 and 0.9535, thus it can be seen that our method is reliable. Then, we use case studies of eight neoplasms to further analyze the performance of our method. We find that most of the predicted miRNA-disease associations are confirmed by previous traditional experiments, and some important miRNAs should be paid more attention, which uncover more associations of various neoplasms than other miRNAs. CONCLUSIONS Our proposed model can reveal miRNA-disease associations and improve the accuracy of correlation prediction for various diseases. Our method can be also easily extended with more similarity kernels.
Collapse
Affiliation(s)
- Limin Jiang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Tianjin University Institute of Computational Biology, Tianjin University, Tianjin, China
| | - Yongkang Xiao
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Tianjin University Institute of Computational Biology, Tianjin University, Tianjin, China.,Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, USA
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.
| |
Collapse
|
14
|
Jiang L, Ding Y, Tang J, Guo F. MDA-SKF: Similarity Kernel Fusion for Accurately Discovering miRNA-Disease Association. Front Genet 2018; 9:618. [PMID: 30619454 PMCID: PMC6295467 DOI: 10.3389/fgene.2018.00618] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 11/23/2018] [Indexed: 12/28/2022] Open
Abstract
Identifying accurate associations between miRNAs and diseases is beneficial for diagnosis and treatment of human diseases. It is especially important to develop an efficient method to detect the association between miRNA and disease. Traditional experimental method has high precision, but its process is complicated and time-consuming. Various computational methods have been developed to uncover potential associations based on an assumption that similar miRNAs are always related to similar diseases. In this paper, we propose an accurate method, MDA-SKF, to uncover potential miRNA-disease associations. We first extract three miRNA similarity kernels (miRNA functional similarity, miRNA sequence similarity, Hamming profile similarity for miRNA) and three disease similarity kernels (disease semantic similarity, disease functional similarity, Hamming profile similarity for disease) in two subspaces, respectively. Then, due to limitations that some initial information may be lost in the process and some noises may be exist in integrated similarity kernel, we propose a novel Similarity Kernel Fusion (SKF) method to integrate multiple similarity kernels. Finally, we utilize the Laplacian Regularized Least Squares (LapRLS) method on the integrated kernel to find potential associations. MDA-SKF is evaluated by three evaluation methods, including global leave-one-out cross validation (LOOCV) and local LOOCV and 5-fold cross validation (CV), and achieves AUCs of 0.9576, 0.8356, and 0.9557, respectively. Compared with existing seven methods, MDA-SKF has outstanding performance on global LOOCV and 5-fold. We also test case studies to further analyze the performance of MDA-SKF on 32 diseases. Furthermore, 3200 candidate associations are obtained and a majority of them can be confirmed. It demonstrates that MDA-SKF is an accurate and efficient computational tool for guiding traditional experiments.
Collapse
Affiliation(s)
- Limin Jiang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
15
|
Lan W, Wang J, Li M, Liu J, Wu FX, Pan Y. Predicting MicroRNA-Disease Associations Based on Improved MicroRNA and Disease Similarities. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1774-1782. [PMID: 27392365 DOI: 10.1109/tcbb.2016.2586190] [Citation(s) in RCA: 86] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
MicroRNAs (miRNAs) are a type of non-coding RNAs with about ∼22nt nucleotides. Increasing evidences have shown that miRNAs play critical roles in many human diseases. The identification of human disease-related miRNAs is helpful to explore the underlying pathogenesis of diseases. More and more experimental validated associations between miRNAs and diseases have been reported in the recent studies, which provide useful information for new miRNA-disease association discovery. In this study, we propose a computational framework, KBMF-MDI, to predict the associations between miRNAs and diseases based on their similarities. The sequence and function information of miRNAs are used to measure similarity among miRNAs while the semantic and function information of disease are used to measure similarity among diseases, respectively. In addition, the kernelized Bayesian matrix factorization method is employed to infer potential miRNA-disease associations by integrating these data sources. We applied this method to 6,084 known miRNA-disease associations and utilized 5-fold cross validation to evaluate the performance. The experimental results demonstrate that our method can effectively predict unknown miRNA-disease associations.
Collapse
|
16
|
Abburu S. Ontology Driven Cross-Linked Domain Data Integration and Spatial Semantic Multi Criteria Query System for Geospatial Public Health. INT J SEMANT WEB INF 2018. [DOI: 10.4018/ijswis.2018070101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This article describes how public health information management is an interdisciplinary application which deals with cross linked application domains. Geospatial environment, place and meteorology parameters effect public health. Effective decision making plays a vital role and requires disease data analysis which in turn requires effective Public Health Knowledge Base (PHKB) and a strong efficient query engine. Ontologies enhance the performance of the retrieval system and achieve application interoperability. The current research aims at building PHKB through ontology based cross linked domain integration. It designs a dynamic GeoSPARQL query building from simple form based query composition. The spatial semantic multi criteria query engine is developed by identifying all possible query patterns considering the ontology elements and multi criteria from cross linked application domains. The research has adopted OGC, W3C, WHO and mHealth standards.
Collapse
|
17
|
Ding P, Luo J, Liang C, Xiao Q, Cao B. Human disease MiRNA inference by combining target information based on heterogeneous manifolds. J Biomed Inform 2018; 80:26-36. [PMID: 29481877 DOI: 10.1016/j.jbi.2018.02.013] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Revised: 02/11/2018] [Accepted: 02/21/2018] [Indexed: 12/12/2022]
Abstract
The emergence of network medicine has provided great insight into the identification of disease-related molecules, which could help with the development of personalized medicine. However, the state-of-the-art methods could neither simultaneously consider target information and the known miRNA-disease associations nor effectively explore novel gene-disease associations as a by-product during the process of inferring disease-related miRNAs. Computational methods incorporating multiple sources of information offer more opportunities to infer disease-related molecules, including miRNAs and genes in heterogeneous networks at a system level. In this study, we developed a novel algorithm, named inference of Disease-related MiRNAs based on Heterogeneous Manifold (DMHM), to accurately and efficiently identify miRNA-disease associations by integrating multi-omics data. Graph-based regularization was utilized to obtain a smooth function on the data manifold, which constitutes the main principle of DMHM. The novelty of this framework lies in the relatedness between diseases and miRNAs, which are measured via heterogeneous manifolds on heterogeneous networks integrating target information. To demonstrate the effectiveness of DMHM, we conducted comprehensive experiments based on HMDD datasets and compared DMHM with six state-of-the-art methods. Experimental results indicated that DMHM significantly outperformed the other six methods under fivefold cross validation and de novo prediction tests. Case studies have further confirmed the practical usefulness of DMHM.
Collapse
Affiliation(s)
- Pingjian Ding
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China.
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China
| | - Qiu Xiao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Buwen Cao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| |
Collapse
|
18
|
Cheng L, Jiang Y, Ju H, Sun J, Peng J, Zhou M, Hu Y. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk. BMC Genomics 2018; 19:919. [PMID: 29363423 PMCID: PMC5780854 DOI: 10.1186/s12864-017-4338-6] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Background Since the establishment of the first biomedical ontology Gene Ontology (GO), the number of biomedical ontology has increased dramatically. Nowadays over 300 ontologies have been built including extensively used Disease Ontology (DO) and Human Phenotype Ontology (HPO). Because of the advantage of identifying novel relationships between terms, calculating similarity between ontology terms is one of the major tasks in this research area. Though similarities between terms within each ontology have been studied with in silico methods, term similarities across different ontologies were not investigated as deeply. The latest method took advantage of gene functional interaction network (GFIN) to explore such inter-ontology similarities of terms. However, it only used gene interactions and failed to make full use of the connectivity among gene nodes of the network. In addition, all existent methods are particularly designed for GO and their performances on the extended ontology community remain unknown. Results We proposed a method InfAcrOnt to infer similarities between terms across ontologies utilizing the entire GFIN. InfAcrOnt builds a term-gene-gene network which comprised ontology annotations and GFIN, and acquires similarities between terms across ontologies through modeling the information flow within the network by random walk. In our benchmark experiments on sub-ontologies of GO, InfAcrOnt achieves a high average area under the receiver operating characteristic curve (AUC) (0.9322 and 0.9309) and low standard deviations (1.8746e-6 and 3.0977e-6) in both human and yeast benchmark datasets exhibiting superior performance. Meanwhile, comparisons of InfAcrOnt results and prior knowledge on pair-wise DO-HPO terms and pair-wise DO-GO terms show high correlations. Conclusions The experiment results show that InfAcrOnt significantly improves the performance of inferring similarities between terms across ontologies in benchmark set. Electronic supplementary material The online version of this article (10.1186/s12864-017-4338-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, People's Republic of China
| | - Yue Jiang
- Hospital for Sick Children, Toronto, M5G 1X8, Canada
| | - Hong Ju
- Department of Information Engineering, Heilongjiang Biological Science and Technology Career Academy, Harbin, 150081, People's Republic of China
| | - Jie Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, People's Republic of China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xian, 710072, People's Republic of China
| | - Meng Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, People's Republic of China.
| | - Yang Hu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150088, People's Republic of China.
| |
Collapse
|
19
|
Hu Y, Zhou M, Shi H, Ju H, Jiang Q, Cheng L. Measuring disease similarity and predicting disease-related ncRNAs by a novel method. BMC Med Genomics 2017; 10:71. [PMID: 29297338 PMCID: PMC5751624 DOI: 10.1186/s12920-017-0315-9] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Background Similar diseases are always caused by similar molecular origins, such as diasease-related protein-coding genes (PCGs). And the molecular associations reflect their similarity. Therefore, current methods for calculating disease similarity often utilized functional interactions of PCGs. Besides, the existing methods have neglected a fact that genes could also be associated in the gene functional network (GFN) based on intermediate nodes. Methods Here we presented a novel method, InfDisSim, to deduce the similarity of diseases. InfDisSim utilized the whole network based on random walk with damping to model the information flow. A benchmark set of similar disease pairs was employed to evaluate the performance of InfDisSim. Results The region beneath the receiver operating characteristic curve (AUC) was calculated to assess the performance. As a result, InfDisSim reaches a high AUC (0.9786) which indicates a very good performance. Furthermore, after calculating the disease similarity by the InfDisSim, we reconfirmed that similar diseases tend to have common therapeutic drugs (Pearson correlation γ2 = 0.1315, p = 2.2e-16). Finally, the disease similarity computed by infDisSim was employed to construct a miRNA similarity network (MSN) and lncRNA similarity network (LSN), which were further exploited to predict potential associations of lncRNA-disease pairs and miRNA-disease pairs, respectively. High AUC (0.9893, 0.9007) based on leave-one-out cross validation shows that the LSN and MSN is very appropriate for predicting novel disease-related lncRNAs and miRNAs, respectively. Conclusions The high AUC based on benchmark data indicates the method performs well. The method is valuable in the prediction of disease-related lncRNAs and miRNAs. Electronic supplementary material The online version of this article (doi: 10.1186/s12920-017-0315-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yang Hu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Meng Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China
| | - Hongbo Shi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China
| | - Hong Ju
- Department of information engineering, Heilongjiang biological science and technology Career Academy, Harbin, 150001, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China.
| |
Collapse
|
20
|
A framework for exploring associations between biomedical terms in PubMed. Oncotarget 2017; 8:103100-103107. [PMID: 29262548 PMCID: PMC5732714 DOI: 10.18632/oncotarget.21532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 09/08/2017] [Indexed: 11/25/2022] Open
Abstract
Co-occurrence relationships in PubMed between terms accelerate the recognition of term associations. The lack of manually curated relationships in vocabularies and the rapid increase of biomedical literatures highlight the importance of co-occurrence relationships. Here we proposed a framework to explore term associations based on a standard procedure that comprises multiple tools of text mining and relationship degree calculation methods. The text of PubMed were segmented into sentences by Apache OpenNLP first, and then terms of sentences were recognized by MGREP. After that two terms occurring in a common sentence were identified as a co-occurrence relationship. The relationship degree is then calculated using Normalized MEDLINE Distance (NMD) or relationship-scaled score (RSS) method. The framework was utilized in exploring associations between terms of Gene Ontology (GO) and Disease Ontology (DO) based on co-occurrence relationship. Results show that pairs of terms with more co-occurrence relationships indicate shared more semantic relationships of ontology and genes. The identified association terms based on co-occurrence relationships were applied in constructing a disease association network (DAN). The small giant component confirms with the observation that diseases in the same class have more linkage than diseases in different classes.
Collapse
|
21
|
Peng H, Lan C, Liu Y, Liu T, Blumenstein M, Li J. Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes. Oncotarget 2017; 8:78901-78916. [PMID: 29108274 PMCID: PMC5668007 DOI: 10.18632/oncotarget.20481] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2017] [Accepted: 07/19/2017] [Indexed: 12/15/2022] Open
Abstract
Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.
Collapse
Affiliation(s)
- Hui Peng
- Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia
| | - Chaowang Lan
- Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia
| | - Yuansheng Liu
- Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia
| | - Tao Liu
- Centre for Childhood Cancer Research, University of New South Wales, Sydney, Kensington, NSW, Australia
| | - Michael Blumenstein
- School of Software, University of Technology Sydney, Broadway, NSW, Australia
| | - Jinyan Li
- Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia
| |
Collapse
|
22
|
Hu Y, Zhao L, Liu Z, Ju H, Shi H, Xu P, Wang Y, Cheng L. DisSetSim: an online system for calculating similarity between disease sets. J Biomed Semantics 2017; 8:28. [PMID: 29297411 PMCID: PMC5763469 DOI: 10.1186/s13326-017-0140-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Background Functional similarity between molecules results in similar phenotypes, such as diseases. Therefore, it is an effective way to reveal the function of molecules based on their induced diseases. However, the lack of a tool for obtaining the similarity score of pair-wise disease sets (SSDS) limits this type of application. Results Here, we introduce DisSetSim, an online system to solve this problem in this article. Five state-of-the-art methods involving Resnik’s, Lin’s, Wang’s, PSB, and SemFunSim methods were implemented to measure the similarity score of pair-wise diseases (SSD) first. And then “pair-wise-best pairs-average” (PWBPA) method was implemented to calculated the SSDS by the SSD. The system was applied for calculating the functional similarity of miRNAs based on their induced disease sets. The results were further used to predict potential disease-miRNA relationships. Conclusions The high area under the receiver operating characteristic curve AUC (0.9296) based on leave-one-out cross validation shows that the PWBPA method achieves a high true positive rate and a low false positive rate. The system can be accessed from http://www.bio-annotation.cn:8080/DisSetSim/.
Collapse
Affiliation(s)
- Yang Hu
- Harbin Institute of Technology, School of Life Science and Technology, Harbin, 150001, People's Republic of China
| | - Lingling Zhao
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Zhiyan Liu
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Hong Ju
- Department of information engineering, Heilongjiang Biological Science and Technology Career Academy, Harbin, 150001, People's Republic of China
| | - Hongbo Shi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, People's Republic of China
| | - Peigang Xu
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Yadong Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, People's Republic of China.
| |
Collapse
|
23
|
Dongliang X, Jingchang P, Bailing W. Multiple kernels learning-based biological entity relationship extraction method. J Biomed Semantics 2017; 8:38. [PMID: 29297359 PMCID: PMC5763518 DOI: 10.1186/s13326-017-0138-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Background Automatic extracting protein entity interaction information from biomedical literature can help to build protein relation network and design new drugs. There are more than 20 million literature abstracts included in MEDLINE, which is the most authoritative textual database in the field of biomedicine, and follow an exponential growth over time. This frantic expansion of the biomedical literature can often be difficult to absorb or manually analyze. Thus efficient and automated search engines are necessary to efficiently explore the biomedical literature using text mining techniques. Results The P, R, and F value of tag graph method in Aimed corpus are 50.82, 69.76, and 58.61%, respectively. The P, R, and F value of tag graph kernel method in other four evaluation corpuses are 2–5% higher than that of all-paths graph kernel. And The P, R and F value of feature kernel and tag graph kernel fuse methods is 53.43, 71.62 and 61.30%, respectively. The P, R and F value of feature kernel and tag graph kernel fuse methods is 55.47, 70.29 and 60.37%, respectively. It indicated that the performance of the two kinds of kernel fusion methods is better than that of simple kernel. Conclusion In comparison with the all-paths graph kernel method, the tag graph kernel method is superior in terms of overall performance. Experiments show that the performance of the multi-kernels method is better than that of the three separate single-kernel method and the dual-mutually fused kernel method used hereof in five corpus sets.
Collapse
Affiliation(s)
- Xu Dongliang
- School of Mechanical, Electrical and Information Engineering, ShanDong University, WenHua West Road, WeiHai, 264209, China
| | - Pan Jingchang
- School of Mechanical, Electrical and Information Engineering, ShanDong University, WenHua West Road, WeiHai, 264209, China.
| | - Wang Bailing
- School of Computer Science and Technology, Harbin Institute of Technology, WenHua West Road, WeiHai, 264209, China
| |
Collapse
|
24
|
Han Y, Sun W, Sun G, Hou X, Gong Z, Xu J, Bai X, Fu L. A 3-year observation of testosterone deficiency in Chinese patients with chronic heart failure. Oncotarget 2017; 8:79835-79842. [PMID: 29108365 PMCID: PMC5668098 DOI: 10.18632/oncotarget.19816] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 07/12/2017] [Indexed: 12/11/2022] Open
Abstract
Testosterone deficiency is present in a certain proportion men with chronic heart failure (CHF). Low testosterone levels in American and European patients with CHF lead to the high mortality and readmission rates. Interestingly, this relationship has not been studied in Chinese patients. To this end, 167 Chinese men with CHF underwent clinical and laboratory evaluations associated with determinations of testosterone levels. Total testosterone (TT) levels and sex hormone-binding globulin were measured by chemiluminescence or immunoassays assays and free testosterone (FT) levels were calculated, Based upon results from these assays, patients were divided into either a low testosterone (LT; n = 93) or normal testosterone (NT; n = 74) group. Subsequently, records from each patient were reviewed over a follow-up duration of at least 3 years. Patients in the LT group experienced worse cardiac function and a higher prevalence of etiology (ischemic vs. no ischemic) and comorbidity (both P < 0.05). In addition, readmission rates of patients in the LT group were higher than that of patients in the NT group (3.32 ± 1.66 VS 1.57 ± 0.89). Overall, deficiencies in FT levels were accompanied with increased mortalities (HR = 6.301, 95% CI 3.187–12.459, P < .0001).
Collapse
Affiliation(s)
- Ying Han
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin 150001, China
| | - Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin 150001, China
| | - Guizhi Sun
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin 150001, China
| | - Xiaolu Hou
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin 150001, China
| | - Zhaowei Gong
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin 150001, China
| | - Jing Xu
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin 150001, China
| | - Xiuping Bai
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin 150001, China
| | - Lu Fu
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin 150001, China
| |
Collapse
|
25
|
Peng W, Lan W, Zhong J, Wang J, Pan Y. A novel method of predicting microRNA-disease associations based on microRNA, disease, gene and environment factor networks. Methods 2017; 124:69-77. [DOI: 10.1016/j.ymeth.2017.05.024] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Revised: 05/02/2017] [Accepted: 05/28/2017] [Indexed: 01/08/2023] Open
|
26
|
Peng H, Lan C, Zheng Y, Hutvagner G, Tao D, Li J. Cross disease analysis of co-functional microRNA pairs on a reconstructed network of disease-gene-microRNA tripartite. BMC Bioinformatics 2017; 18:193. [PMID: 28340554 PMCID: PMC5366146 DOI: 10.1186/s12859-017-1605-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 03/15/2017] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND MicroRNAs always function cooperatively in their regulation of gene expression. Dysfunctions of these co-functional microRNAs can play significant roles in disease development. We are interested in those multi-disease associated co-functional microRNAs that regulate their common dysfunctional target genes cooperatively in the development of multiple diseases. The research is potentially useful for human disease studies at the transcriptional level and for the study of multi-purpose microRNA therapeutics. METHODS AND RESULTS We designed a computational method to detect multi-disease associated co-functional microRNA pairs and conducted cross disease analysis on a reconstructed disease-gene-microRNA (DGR) tripartite network. The construction of the DGR tripartite network is by the integration of newly predicted disease-microRNA associations with those relationships of diseases, microRNAs and genes maintained by existing databases. The prediction method uses a set of reliable negative samples of disease-microRNA association and a pre-computed kernel matrix instead of kernel functions. From this reconstructed DGR tripartite network, multi-disease associated co-functional microRNA pairs are detected together with their common dysfunctional target genes and ranked by a novel scoring method. We also conducted proof-of-concept case studies on cancer-related co-functional microRNA pairs as well as on non-cancer disease-related microRNA pairs. CONCLUSIONS With the prioritization of the co-functional microRNAs that relate to a series of diseases, we found that the co-function phenomenon is not unusual. We also confirmed that the regulation of the microRNAs for the development of cancers is more complex and have more unique properties than those of non-cancer diseases.
Collapse
Affiliation(s)
- Hui Peng
- Advanced Analytics Institute, University of Technology Sydney, PO Box 123, Broadway, 2007, NSW, Australia
| | - Chaowang Lan
- Advanced Analytics Institute, University of Technology Sydney, PO Box 123, Broadway, 2007, NSW, Australia
| | - Yi Zheng
- Advanced Analytics Institute, University of Technology Sydney, PO Box 123, Broadway, 2007, NSW, Australia
| | - Gyorgy Hutvagner
- Centre for Health Technologies, University of Technology Sydney, PO Box 123, Broadway, 2007, NSW, Australia
| | - Dacheng Tao
- School of Information Technologies and the Faculty of Engineering and Information Technologies, University of Sydney, J12/318 Cleveland St, Darlington, 2008, NSW, Australia
| | - Jinyan Li
- Advanced Analytics Institute, University of Technology Sydney, PO Box 123, Broadway, 2007, NSW, Australia.
| |
Collapse
|
27
|
Peng W, Lan W, Yu Z, Wang J, Pan Y. A Framework for Integrating Multiple Biological Networks to Predict MicroRNA-Disease Associations. IEEE Trans Nanobioscience 2017; 16:100-107. [DOI: 10.1109/tnb.2016.2633276] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
28
|
Abstract
Background Identifying the genes associated to human diseases is crucial for disease diagnosis and drug design. Computational approaches, esp. the network-based approaches, have been recently developed to identify disease-related genes effectively from the existing biomedical networks. Meanwhile, the advance in biotechnology enables researchers to produce multi-omics data, enriching our understanding on human diseases, and revealing the complex relationships between genes and diseases. However, none of the existing computational approaches is able to integrate the huge amount of omics data into a weighted integrated network and utilize it to enhance disease related gene discovery. Results We propose a new network-based disease gene prediction method called SLN-SRW (Simplified Laplacian Normalization-Supervised Random Walk) to generate and model the edge weights of a new biomedical network that integrates biomedical data from heterogeneous sources, thus far enhancing the disease related gene discovery. Conclusions The experiment results show that SLN-SRW significantly improves the performance of disease gene prediction on both the real and the synthetic data sets. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3263-4) contains supplementary material, which is available to authorized users.
Collapse
|
29
|
Ni P, Li M, Zhong P, Duan G, Wang J, Li Y, Wu F. Relating Diseases Based on Disease Module Theory. LECTURE NOTES IN COMPUTER SCIENCE 2017:24-33. [DOI: 10.1007/978-3-319-59575-7_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
30
|
Li P, Nie Y, Yu J. Fusing literature and full network data improves disease similarity computation. BMC Bioinformatics 2016; 17:326. [PMID: 27578323 PMCID: PMC5006367 DOI: 10.1186/s12859-016-1205-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2016] [Accepted: 08/24/2016] [Indexed: 01/01/2023] Open
Abstract
Background Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize information of the entire protein interaction network, using instead only those interactions involving disease causing genes. Most of previously published methods required gene-disease association data, unfortunately, many diseases still have very few or no associated genes, which impeded broad adoption of those methods. In this study, we propose a new method (MedNetSim) for computing disease similarity by integrating medical literature and protein interaction network. MedNetSim consists of a network-based method (NetSim), which employs the entire protein interaction network, and a MEDLINE-based method (MedSim), which computes disease similarity by mining the biomedical literature. Results Among function-based methods, NetSim achieved the best performance. Its average AUC (area under the receiver operating characteristic curve) reached 95.2 %. MedSim, whose performance was even comparable to some function-based methods, acquired the highest average AUC in all semantic-based methods. Integration of MedSim and NetSim (MedNetSim) further improved the average AUC to 96.4 %. We further studied the effectiveness of different data sources. It was found that quality of protein interaction data was more important than its volume. On the contrary, higher volume of gene-disease association data was more beneficial, even with a lower reliability. Utilizing higher volume of disease-related gene data further improved the average AUC of MedNetSim and NetSim to 97.5 % and 96.7 %, respectively. Conclusions Integrating biomedical literature and protein interaction network can be an effective way to compute disease similarity. Lacking sufficient disease-related gene data, literature-based methods such as MedSim can be a great addition to function-based algorithms. It may be beneficial to steer more resources torward studying gene-disease associations and improving the quality of protein interaction data. Disease similarities can be computed using the proposed methods at http://www.digintelli.com:8000/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1205-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ping Li
- State Key Laboratory of Biochemical Engineering, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yaling Nie
- State Key Laboratory of Biochemical Engineering, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jingkai Yu
- State Key Laboratory of Biochemical Engineering, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, 100190, China.
| |
Collapse
|
31
|
DisSim: an online system for exploring significant similar diseases and exhibiting potential therapeutic drugs. Sci Rep 2016; 6:30024. [PMID: 27457921 PMCID: PMC4960572 DOI: 10.1038/srep30024] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 06/27/2016] [Indexed: 12/11/2022] Open
Abstract
The similarity of pair-wise diseases reveals the molecular relationships between them. For example, similar diseases have the potential to be treated by common therapeutic chemicals (TCs). In this paper, we introduced DisSim, an online system for exploring similar diseases, and comparing corresponding TCs. Currently, DisSim implemented five state-of-the-art methods to measure the similarity between Disease Ontology (DO) terms and provide the significance of the similarity score. Furthermore, DisSim integrated TCs of diseases from the Comparative Toxicogenomics Database (CTD), which can help to identify potential relationships between TCs and similar diseases. The system can be accessed from http://123.59.132.21:8080/DisSim.
Collapse
|
32
|
Bai T, Gong L, Wang Y, Wang Y, Kulikowski CA, Huang L. A method for exploring implicit concept relatedness in biomedical knowledge network. BMC Bioinformatics 2016; 17 Suppl 9:265. [PMID: 27454167 PMCID: PMC4959351 DOI: 10.1186/s12859-016-1131-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biomedical information and knowledge, structural and non-structural, stored in different repositories can be semantically connected to form a hybrid knowledge network. How to compute relatedness between concepts and discover valuable but implicit information or knowledge from it effectively and efficiently is of paramount importance for precision medicine, and a major challenge facing the biomedical research community. RESULTS In this study, a hybrid biomedical knowledge network is constructed by linking concepts across multiple biomedical ontologies as well as non-structural biomedical knowledge sources. To discover implicit relatedness between concepts in ontologies for which potentially valuable relationships (implicit knowledge) may exist, we developed a Multi-Ontology Relatedness Model (MORM) within the knowledge network, for which a relatedness network (RN) is defined and computed across multiple ontologies using a formal inference mechanism of set-theoretic operations. Semantic constraints are designed and implemented to prune the search space of the relatedness network. CONCLUSIONS Experiments to test examples of several biomedical applications have been carried out, and the evaluation of the results showed an encouraging potential of the proposed approach to biomedical knowledge discovery.
Collapse
Affiliation(s)
- Tian Bai
- College of Computer Science and Technology, Jilin Univesity, 2699 Qianjin St, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, 2699 Qianjin St, Changchun, China
| | - Leiguang Gong
- College of Computer Science and Technology, Jilin Univesity, 2699 Qianjin St, Changchun, China
- Yantai Intelligent Information Technologies Ltd., 2699 Qianjin St, Yantai, China
| | - Ye Wang
- College of Computer Science and Technology, Jilin Univesity, 2699 Qianjin St, Changchun, China
| | - Yan Wang
- College of Computer Science and Technology, Jilin Univesity, 2699 Qianjin St, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, 2699 Qianjin St, Changchun, China
| | - Casimir A. Kulikowski
- Department of Computer Science, Rutgers, The State University of New Jersey, 2699 Qianjin St, Piscataway, NJ USA
| | - Lan Huang
- College of Computer Science and Technology, Jilin Univesity, 2699 Qianjin St, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, 2699 Qianjin St, Changchun, China
| |
Collapse
|
33
|
Cheng L, Li J, Hu Y, Jiang Y, Liu Y, Chu Y, Wang Z, Wang Y. Using Semantic Association to Extend and Infer Literature-Oriented Relativity Between Terms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1219-1226. [PMID: 26684460 DOI: 10.1109/tcbb.2015.2430289] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Relative terms often appear together in the literature. Methods have been presented for weighting relativity of pairwise terms by their co-occurring literature and inferring new relationship. Terms in the literature are also in the directed acyclic graph of ontologies, such as Gene Ontology and Disease Ontology. Therefore, semantic association between terms may help for establishing relativities between terms in literature. However, current methods do not use these associations. In this paper, an adjusted R-scaled score (ARSS) based on information content (ARSSIC) method is introduced to infer new relationship between terms. First, set inclusion relationship between terms of ontology was exploited to extend relationships between these terms and literature. Next, the ARSS method was presented to measure relativity between terms across ontologies according to these extensional relationships. Then, the ARSSIC method using ratios of information shared of term's ancestors was designed to infer new relationship between terms across ontologies. The result of the experiment shows that ARSS identified more pairs of statistically significant terms based on corresponding gene sets than other methods. And the high average area under the receiver operating characteristic curve (0.9293) shows that ARSSIC achieved a high true positive rate and a low false positive rate. Data is available at http://mlg.hit.edu.cn/ARSSIC/.
Collapse
|
34
|
Cheng L, Li J, Ju P, Peng J, Wang Y. SemFunSim: a new method for measuring disease similarity by integrating semantic and gene functional association. PLoS One 2014; 9:e99415. [PMID: 24932637 PMCID: PMC4059643 DOI: 10.1371/journal.pone.0099415] [Citation(s) in RCA: 83] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 05/14/2014] [Indexed: 01/20/2023] Open
Abstract
Background Measuring similarity between diseases plays an important role in disease-related molecular function research. Functional associations between disease-related genes and semantic associations between diseases are often used to identify pairs of similar diseases from different perspectives. Currently, it is still a challenge to exploit both of them to calculate disease similarity. Therefore, a new method (SemFunSim) that integrates semantic and functional association is proposed to address the issue. Methods SemFunSim is designed as follows. First of all, FunSim (Functional similarity) is proposed to calculate disease similarity using disease-related gene sets in a weighted network of human gene function. Next, SemSim (Semantic Similarity) is devised to calculate disease similarity using the relationship between two diseases from Disease Ontology. Finally, FunSim and SemSim are integrated to measure disease similarity. Results The high average AUC (area under the receiver operating characteristic curve) (96.37%) shows that SemFunSim achieves a high true positive rate and a low false positive rate. 79 of the top 100 pairs of similar diseases identified by SemFunSim are annotated in the Comparative Toxicogenomics Database (CTD) as being targeted by the same therapeutic compounds, while other methods we compared could identify 35 or less such pairs among the top 100. Moreover, when using our method on diseases without annotated compounds in CTD, we could confirm many of our predicted candidate compounds from literature. This indicates that SemFunSim is an effective method for drug repositioning.
Collapse
Affiliation(s)
- Liang Cheng
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Jie Li
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Peng Ju
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
| | - Jiajie Peng
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yadong Wang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| |
Collapse
|