1
|
Kartheeswaran KP, Rayan AXA, Varrieth GT. Genetically and semantically aware homogeneous network for prediction and scoring of comorbidities. Comput Biol Med 2024; 183:109252. [PMID: 39418770 DOI: 10.1016/j.compbiomed.2024.109252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 06/29/2024] [Accepted: 10/04/2024] [Indexed: 10/19/2024]
Abstract
OBJECTIVE Patients with comorbidities are highly prone to mortality risk than those suffering from a single disease. Therefore, quantification and prediction of disease comorbidities is necessary to stratify the mortality risk of the patients, predict the probability of their occurrence, design treatment strategies, and to prevent the progression of diseases. Enriching comorbidity disease relationships with rich semantics established by genetic components play a vital role in effectively quantifying and predicting comorbidities. However, the existing studies have not extensively explored the semantic richness conveyed by different types of genetic links connecting the comorbidity pairs. METHODS To solve this, a novel genetic-semantic aware weighted homogeneous network-based method, GSWHomoNet is proposed which first constructs the gene enriched comorbidity heterogeneous network, CoGHetNet with encoded genetic semantic aware weighted meta-path instance disease pair embedding to obtain an enhanced disease node embedding of the network. For enhanced comorbidity prediction and scoring, both direct and indirect semantically enriched comorbidity relationships of the disease nodes is preserved while transforming heterogeneous to homogeneous comorbidity network GSWHomoNet. The proposed GSWHomoNet not only helps discover comorbidity links transductively between known-known disease pairs but also improves the inductive link prediction between known-unknown disease pairs by supplying unknown disease nodes with semantically enriched heterogeneous structural knowledge. RESULTS The effectiveness of the proposed components is proved by AUC scores of 0.895 and 0.860, as well as AUPR scores of 0.903 and 0.873 for transductive and inductive link prediction respectively. In comorbidity scoring, GSWHomoNet outperformed other methods with a correlation result of 0.848. The effect of the improved association prediction ability of the genetic semantic aware weighted meta-path instance embedding based node embedding is proved on disease-microbe and bibliographic heterogeneous network datasets. For biological significance of GSWHomoNet-based comorbidity scoring, we compared it with gene, pathway, and protein-protein interaction (PPI) perspectives, revealing a stronger correlation with the PPI aspect. We identified a substantial number of predicted comorbidity disease pairs, with 77,456 and 48,972 pairs supported by literature evidence for transductive and inductive predictions, respectively. Additionally, we highlighted shared pathways and PPIs for these pairs, demonstrating the robustness of comorbidity predictions.
Collapse
Affiliation(s)
| | - Arockia Xavier Annie Rayan
- Department of Computer Science and Engineering, CEG Campus, Anna University, Chennai, Tamil Nadu, India.
| | | |
Collapse
|
2
|
Muniyappan S, Rayan AXA, Varrieth GT. EGeRepDR: An enhanced genetic-based representation learning for drug repurposing using multiple biomedical sources. J Biomed Inform 2023; 147:104528. [PMID: 37858852 DOI: 10.1016/j.jbi.2023.104528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 09/11/2023] [Accepted: 10/16/2023] [Indexed: 10/21/2023]
Abstract
MOTIVATION Drug repurposing (DR) is an imminent approach for identifying novel therapeutic indications for the available drugs and discovering novel drugs for previously untreatable diseases. Nowadays, DR has major attention in the pharmaceutical industry due to the high cost and time of launching new drugs to the market through traditional drug development. DR task majorly depends on genetic information since the drugs revert the modified Gene Expression (GE) of diseases to normal. Many of the existing studies have not considered the genetic importance of predicting the potential candidates. METHOD We proposed a novel multimodal framework that utilizes genetic aspects of drugs and diseases such as genes, pathways, gene signatures, or expression to enhance the performance of DR using various data sources. Firstly, the heterogeneous biological network (HBN) is constructed with three types of nodes namely drug, disease, and gene, and 4 types of edges similarities (drug, gene, and disease), drug-gene, gene-disease, and drug-disease. Next, a modified graph auto-encoder (GAE*) model is applied to learn the representation of drug and disease nodes using the topological structure and edge information. Secondly, the HBN is enhanced with the information extracted from biomedical literature and ontology using a novel semi-supervised pattern embedding-based bootstrapping model and novel DR perspective representation learning respectively to improve the prediction performance. Finally, our proposed system uses a neural network model to generate the probability score of drug-disease pairs. RESULTS We demonstrate the efficiency of the proposed model on various datasets and achieved outstanding performance in 5-fold cross-validation (AUC = 0.99, AUPR = 0.98). Further, we validated the top-ranked potential candidates using pathway analysis and proved that the known and predicted candidates share common genes in the pathways.
Collapse
Affiliation(s)
- Saranya Muniyappan
- Computer Science and Engineering, CEG Campus, Anna University, Chennai, Tamil Nadu, India.
| | | | | |
Collapse
|
3
|
Sánchez-Valle J, Valencia A. Molecular bases of comorbidities: present and future perspectives. Trends Genet 2023; 39:773-786. [PMID: 37482451 DOI: 10.1016/j.tig.2023.06.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 06/12/2023] [Accepted: 06/12/2023] [Indexed: 07/25/2023]
Abstract
Co-occurrence of diseases decreases patient quality of life, complicates treatment choices, and increases mortality. Analyses of electronic health records present a complex scenario of comorbidity relationships that vary by age, sex, and cohort under study. The study of similarities between diseases using 'omics data, such as genes altered in diseases, gene expression, proteome, and microbiome, are fundamental to uncovering the origin of, and potential treatment for, comorbidities. Recent studies have produced a first generation of genetic interpretations for as much as 46% of the comorbidities described in large cohorts. Integrating different sources of molecular information and using artificial intelligence (AI) methods are promising approaches for the study of comorbidities. They may help to improve the treatment of comorbidities, including the potential repositioning of drugs.
Collapse
Affiliation(s)
- Jon Sánchez-Valle
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona, 08034, Spain.
| | - Alfonso Valencia
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona, 08034, Spain; ICREA, Barcelona, 08010, Spain.
| |
Collapse
|
4
|
Li B, Tian Y, Tian Y, Zhang S, Zhang X. Predicting Cancer Lymph-Node Metastasis From LncRNA Expression Profiles Using Local Linear Reconstruction Guided Distance Metric Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3179-3189. [PMID: 35139024 DOI: 10.1109/tcbb.2022.3149791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Lymph-node metastasis is the most perilous cancer progressive state, where long non-coding RNA (lncRNA) has been confirmed to be an important genetic indicator in cancer prediction. However, lncRNA expression profile is often characterized of large features and small samples, it is urgent to establish an efficient judgment to deal with such high dimensional lncRNA data, which will aid in clinical targeted treatment. Thus, in this study, a local linear reconstruction guided distance metric learning is put forward to handle lncRNA data for determination of cancer lymph-node metastasis. In the original locally linear embedding (LLE) approach, any point can be approximately linearly reconstructed using its nearest neighborhood points, from which a novel distance metric can be learned by satisfying both nonnegative and sum-to-one constraints on the reconstruction weights. Taking the defined distance metric and lncRNA data supervised information into account, a local margin model will be deduced to find a low dimensional subspace for lncRNA signature extraction. At last, a classifier is constructed to predict cancer lymph-node metastasis, where the learned distance metric is also adopted. Several experiments on lncRNA data sets have been carried out, and experimental results show the performance of the proposed method by making comparisons with some other related dimensionality reduction methods and the classical classifier models.
Collapse
|
5
|
Dong G, Zhang ZC, Feng J, Zhao XM. MorbidGCN: prediction of multimorbidity with a graph convolutional network based on integration of population phenotypes and disease network. Brief Bioinform 2022; 23:6627601. [PMID: 35780382 DOI: 10.1093/bib/bbac255] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/17/2022] [Accepted: 06/01/2022] [Indexed: 02/06/2023] Open
Abstract
Exploring multimorbidity relationships among diseases is of great importance for understanding their shared mechanisms, precise diagnosis and treatment. However, the landscape of multimorbidities is still far from complete due to the complex nature of multimorbidity. Although various types of biological data, such as biomolecules and clinical symptoms, have been used to identify multimorbidities, the population phenotype information (e.g. physical activity and diet) remains less explored for multimorbidity. Here, we present a graph convolutional network (GCN) model, named MorbidGCN, for multimorbidity prediction by integrating population phenotypes and disease network. Specifically, MorbidGCN treats the multimorbidity prediction as a missing link prediction problem in the disease network, where a novel feature selection method is embedded to select important phenotypes. Benchmarking results on two large-scale multimorbidity data sets, i.e. the UK Biobank (UKB) and Human Disease Network (HuDiNe) data sets, demonstrate that MorbidGCN outperforms other competitive methods. With MorbidGCN, 9742 and 14 010 novel multimorbidities are identified in the UKB and HuDiNe data sets, respectively. Moreover, we notice that the selected phenotypes that are generally differentially distributed between multimorbidity patients and single-disease patients can help interpret multimorbidities and show potential for prognosis of multimorbidities.
Collapse
Affiliation(s)
- Guiying Dong
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Zi-Chao Zhang
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Jianfeng Feng
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China.,Zhangjiang Fudan International Innovation Center, Shanghai, 200433, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China.,Zhangjiang Fudan International Innovation Center, Shanghai, 200433, China
| |
Collapse
|
6
|
Gao YL, Wu MJ, Liu JX, Zheng CH, Wang J. Robust Principal Component Analysis Based On Hypergraph Regularization for Sample Clustering and Co-Characteristic Gene Selection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2420-2430. [PMID: 33690124 DOI: 10.1109/tcbb.2021.3065054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Extracting genes involved in cancer lesions from gene expression data is critical for cancer research and drug development. The method of feature selection has attracted much attention in the field of bioinformatics. Principal Component Analysis (PCA) is a widely used method for learning low-dimensional representation. Some variants of PCA have been proposed to improve the robustness and sparsity of the algorithm. However, the existing methods ignore the high-order relationships between data. In this paper, a new model named Robust Principal Component Analysis via Hypergraph Regularization (HRPCA) is proposed. In detail, HRPCA utilizes L2,1-norm to reduce the effect of outliers and make data sufficiently row-sparse. And the hypergraph regularization is introduced to consider the complex relationship among data. Important information hidden in the data are mined, and this method ensures the accuracy of the resulting data relationship information. Extensive experiments on multi-view biological data demonstrate that the feasible and effective of the proposed approach.
Collapse
|
7
|
Abdalrada AS, Abawajy J, Al-Quraishi T, Islam SMS. Machine learning models for prediction of co-occurrence of diabetes and cardiovascular diseases: a retrospective cohort study. J Diabetes Metab Disord 2022; 21:251-261. [PMID: 35673486 PMCID: PMC9167176 DOI: 10.1007/s40200-021-00968-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 12/29/2021] [Indexed: 12/15/2022]
Abstract
Background Diabetic mellitus (DM) and cardiovascular diseases (CVD) cause significant healthcare burden globally and often co-exists. Current approaches often fail to identify many people with co-occurrence of DM and CVD, leading to delay in healthcare seeking, increased complications and morbidity. In this paper, we aimed to develop and evaluate a two-stage machine learning (ML) model to predict the co-occurrence of DM and CVD. Methods We used the diabetes complications screening research initiative (DiScRi) dataset containing >200 variables from >2000 participants. In the first stage, we used two ML models (logistic regression and Evimp functions) implemented in multivariate adaptive regression splines model to infer the significant common risk factors for DM and CVD and applied the correlation matrix to reduce redundancy. In the second stage, we used classification and regression algorithm to develop our model. We evaluated the prediction models using prediction accuracy, sensitivity and specificity as performance metrics. Results Common risk factors for DM and CVD co-occurrence was family history of the diseases, gender, deep breathing heart rate change, lying to standing blood pressure change, HbA1c, HDL and TC\HDL ratio. The predictive model showed that the participants with HbA1c >6.45 and TC\HDL ratio > 5.5 were at risk of developing both diseases (97.9% probability). In contrast, participants with HbA1c >6.45 and TC\HDL ratio ≤ 5.5 were more likely to have only DM (84.5% probability) and those with HbA1c ≤5.45 and HDL >1.45 were likely to be healthy (82.4%. probability). Further, participants with HbA1c ≤5.45 and HDL <1.45 were at risk of only CVD (100% probability). The predictive accuracy of the ML model to detect co-occurrence of DM and CVD is 94.09%, sensitivity 93.5%, and specificity 95.8%. Conclusions Our ML model can significantly predict with high accuracy the co-occurrence of DM and CVD in people attending a screening program. This might help in early detection of patients with DM and CVD who could benefit from preventive treatment and reduce future healthcare burden.
Collapse
Affiliation(s)
- Ahmad Shaker Abdalrada
- Faculty of Computer Science and Information Technology, Wasit University, Al Kut, Iraq
- School of Information Technology, Deakin University, Melbourne, Victoria Australia
| | - Jemal Abawajy
- School of Information Technology, Deakin University, Melbourne, Victoria Australia
| | - Tahsien Al-Quraishi
- Faculty of Computer Science and Information Technology, Wasit University, Al Kut, Iraq
- School of Information Technology, Deakin University, Melbourne, Victoria Australia
| | - Sheikh Mohammed Shariful Islam
- Institute for Physical Activity and Nutrition, Deakin University, 221 Burwood Highway, Burwood, Melbourne, VIC 3125 Australia
| |
Collapse
|
8
|
Wang Y, Zang J, Liu C, Yan Z, Shi D. Interleukin-17 Links Inflammatory Cross-Talks Between Comorbid Psoriasis and Atherosclerosis. Front Immunol 2022; 13:835671. [PMID: 35514987 PMCID: PMC9063001 DOI: 10.3389/fimmu.2022.835671] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 03/23/2022] [Indexed: 11/13/2022] Open
Abstract
Psoriasis is a chronic, systemic, immune-mediated inflammatory disorder that is associated with a significantly increased risk of cardiovascular disease (CVD). Studies have shown that psoriasis often coexists with atherosclerosis, a chronic inflammatory disease of large and medium-sized arteries, which is a major cause of CVD. Although the molecular mechanisms underlying this comorbidity are not fully understood, clinical studies have shown that when interleukin (IL)-17A inhibitors effectively improve psoriatic lesions, atherosclerotic symptoms are also ameliorated in patients with both psoriasis and atherosclerosis. Also, IL-17A levels are highly expressed in the psoriatic lesions and atherosclerotic plaques. These clinical observations implicit that IL-17A could be a crucial link for psoriasis and atherosclerosis and IL-17A-induced inflammatory responses are the major contribution to the pathogenesis of comorbid psoriasis and atherosclerosis. In this review, the current literature related to epidemiology, genetic predisposition, and inflammatory mechanisms of comorbidity of psoriasis and atherosclerosis is summarized. We focus on the immunopathological effects of IL-17A in both diseases. The goal of this review is to provide the theoretical base for future preventing or treating psoriasis patients with atherosclerosis comorbidity. The current evidence support the notion that treatments targeting IL-17 seem to be hold some promise to reduce cardiovascular risk in patients with psoriasis.
Collapse
Affiliation(s)
- Yan Wang
- College of Clinical Medicine, Jining Medical University, Jining, China
| | - Jinxin Zang
- Department of Neurology, Jining No.1 People's Hospital, Jining, China
| | - Chen Liu
- Laboratory of Medical Mycology, Jining No.1 People's Hospital, Jining, China
| | - Zhongrui Yan
- Department of Neurology, Jining No.1 People's Hospital, Jining, China
| | - Dongmei Shi
- Laboratory of Medical Mycology, Jining No.1 People's Hospital, Jining, China.,Department of Dermatology, Jining No.1 People's Hospital, Jining, China
| |
Collapse
|
9
|
Hu P, Huang YA, Mei J, Leung H, Chen ZH, Kuang ZM, You ZH, Hu L. Learning from low-rank multimodal representations for predicting disease-drug associations. BMC Med Inform Decis Mak 2021; 21:308. [PMID: 34736437 PMCID: PMC8567544 DOI: 10.1186/s12911-021-01648-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 10/06/2021] [Indexed: 12/15/2022] Open
Abstract
Background Disease-drug associations provide essential information for drug discovery and disease treatment. Many disease-drug associations remain unobserved or unknown, and trials to confirm these associations are time-consuming and expensive. To better understand and explore these valuable associations, it would be useful to develop computational methods for predicting unobserved disease-drug associations. With the advent of various datasets describing diseases and drugs, it has become more feasible to build a model describing the potential correlation between disease and drugs.
Results In this work, we propose a new prediction method, called LMFDA, which works in several stages. First, it studies the drug chemical structure, disease MeSH descriptors, disease-related phenotypic terms, and drug-drug interactions. On this basis, similarity networks of different sources are constructed to enrich the representation of drugs and diseases. Based on the fused disease similarity network and drug similarity network, LMFDA calculated the association score of each pair of diseases and drugs in the database. This method achieves good performance on Fdataset and Cdataset, AUROCs were 91.6% and 92.1% respectively, higher than many of the existing computational models. Conclusions The novelty of LMFDA lies in the introduction of multimodal fusion using low-rank tensors to fuse multiple similar networks and combine matrix complement technology to predict potential association. We have demonstrated that LMFDA can display excellent network integration ability for accurate disease-drug association inferring and achieve substantial improvement over the advanced approach. Overall, experimental results on two real-world networks dataset demonstrate that LMFDA able to delivers an excellent detecting performance. Results also suggest that perfecting similar networks with as much domain knowledge as possible is a promising direction for drug repositioning.
Collapse
Affiliation(s)
- Pengwei Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Yu-An Huang
- The Hong Kong Polytechnic University, Hong Kong SAR, China
| | | | - Henry Leung
- Electrical and Computer Engineering, University of Calgary, Calgary, Canada
| | - Zhan-Heng Chen
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Ze-Min Kuang
- Beijing Anzhen Hospital of Capital Medical University, Beijing, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.
| |
Collapse
|
10
|
Dong G, Feng J, Sun F, Chen J, Zhao XM. A global overview of genetically interpretable multimorbidities among common diseases in the UK Biobank. Genome Med 2021; 13:110. [PMID: 34225788 PMCID: PMC8258962 DOI: 10.1186/s13073-021-00927-6] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 06/22/2021] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Multimorbidities greatly increase the global health burdens, but the landscapes of their genetic risks have not been systematically investigated. METHODS We used the hospital inpatient data of 385,335 patients in the UK Biobank to investigate the multimorbid relations among 439 common diseases. Post-GWAS analyses were performed to identify multimorbidity shared genetic risks at the genomic loci, network, as well as overall genetic architecture levels. We conducted network decomposition for the networks of genetically interpretable multimorbidities to detect the hub diseases and the involved molecules and functions in each module. RESULTS In total, 11,285 multimorbidities among 439 common diseases were identified, and 46% of them were genetically interpretable at the loci, network, or overall genetic architecture levels. Multimorbidities affecting the same and different physiological systems displayed different patterns of the shared genetic components, with the former more likely to share loci-level genetic components while the latter more likely to share network-level genetic components. Moreover, both the loci- and network-level genetic components shared by multimorbidities converged on cell immunity, protein metabolism, and gene silencing. Furthermore, we found that the genetically interpretable multimorbidities tend to form network modules, mediated by hub diseases and featuring physiological categories. Finally, we showcased how hub diseases mediating the multimorbidity modules could help provide useful insights for the genetic contributors of multimorbidities. CONCLUSIONS Our results provide a systematic resource for understanding the genetic predispositions of multimorbidities and indicate that hub diseases and converged molecules and functions may be the key for treating multimorbidities. We have created an online database that facilitates researchers and physicians to browse, search, or download these multimorbidities ( https://multimorbidity.comp-sysbio.org ).
Collapse
Affiliation(s)
- Guiying Dong
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433 China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433 China
| | - Jianfeng Feng
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433 China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433 China
- Zhangjiang Fudan International Innovation Center, Shanghai, 200433 China
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA 90089 USA
| | - Jingqi Chen
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433 China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433 China
- Zhangjiang Fudan International Innovation Center, Shanghai, 200433 China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433 China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433 China
- Zhangjiang Fudan International Innovation Center, Shanghai, 200433 China
| |
Collapse
|
11
|
Xiao Q, Fu Y, Yang Y, Dai J, Luo J. NSL2CD: identifying potential circRNA-disease associations based on network embedding and subspace learning. Brief Bioinform 2021; 22:6265177. [PMID: 33954582 DOI: 10.1093/bib/bbab177] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Revised: 03/29/2021] [Accepted: 04/14/2021] [Indexed: 12/28/2022] Open
Abstract
Many studies have evidenced that circular RNAs (circRNAs) are important regulators in various pathological processes and play vital roles in many human diseases, which could serve as promising biomarkers for disease diagnosis, treatment and prognosis. However, the functions of most of circRNAs remain to be unraveled, and it is time-consuming and costly to uncover those relationships between circRNAs and diseases by conventional experimental methods. Thus, identifying candidate circRNAs for human diseases offers new opportunities to understand the functional properties of circRNAs and the pathogenesis of diseases. In this study, we propose a novel network embedding-based adaptive subspace learning method (NSL2CD) for predicting potential circRNA-disease associations and discovering those disease-related circRNA candidates. The proposed method first calculates disease similarities and circRNA similarities by fully utilizing different data sources and learns low-dimensional node representations with network embedding methods. Then, we adopt an adaptive subspace learning model to discover potential associations between circRNAs and diseases. Meanwhile, an integrated weighted graph regularization term is imposed to preserve local geometric structures of data spaces, and L1,2-norm constraint is also incorporated into the model to realize the smoothness and sparsity of projection matrices. The experiment results show that NSL2CD achieves comparable performance under different evaluation metrics, and case studies further confirm its ability to discover potential candidate circRNAs for human diseases.
Collapse
Affiliation(s)
- Qiu Xiao
- Hunan Normal University and Hunan Xiangjiang Artificial Intelligence Academy, China
| | - Yu Fu
- Hunan Normal University, China
| | - Yide Yang
- School of Medicine, Hunan Normal University, China
| | - Jianhua Dai
- Hunan Normal University and Hunan Xiangjiang Artificial Intelligence Academy, China
| | | |
Collapse
|
12
|
Biswas S, Mitra P, Rao KS. Relation Prediction of Co-Morbid Diseases Using Knowledge Graph Completion. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:708-717. [PMID: 31295118 DOI: 10.1109/tcbb.2019.2927310] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Co-morbid disease condition refers to the simultaneous presence of one or more diseases along with the primary disease. A patient suffering from co-morbid diseases possess more mortality risk than with a disease alone. So, it is necessary to predict co-morbid disease pairs. In past years, though several methods have been proposed by researchers for predicting the co-morbid diseases, not much work is done in prediction using knowledge graph embedding using tensor factorization. Moreover, the complex-valued vector-based tensor factorization is not being used in any knowledge graph with biological and biomedical entities. We propose a tensor factorization based approach on biological knowledge graphs. Our method introduces the concept of complex-valued embedding in knowledge graphs with biological entities. Here, we build a knowledge graph with disease-gene associations and their corresponding background information. To predict the association between prevalent diseases, we use ComplEx embedding based tensor decomposition method. Besides, we obtain new prevalent disease pairs using the MCL algorithm in a disease-gene-gene network and check their corresponding inter-relations using edge prediction task.
Collapse
|
13
|
Hwang S, Lee T, Yoon Y. Exploring disease comorbidity in a module-module interaction network. J Bioinform Comput Biol 2020; 18:2050010. [PMID: 32404015 DOI: 10.1142/s0219720020500109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Understanding disease comorbidity contributes to improved quality of life in patients who are suffering from multiple diseases. Therefore, to better explore comorbid diseases, the clarification of associations between diseases based on biological functions is essential. In our study, we propose a method for identifying disease comorbidity in a module-based network, named the module-module interaction (MMI) network, which represents how biological functions influence each other. To construct the MMI network, we detected gene modules - sets of genes that have a higher probability of taking part in specific functions - and established a link between these modules. Subsequently, we constructed disease-related networks in the MMI network to understand inherent disease mechanisms and calculated comorbidity scores of disease pairs using Gene Ontology (GO) terms. Our results show that we can obtain further information on disease mechanisms by considering interactions between functional modules instead of between genes. In addition, we verified that predicted comorbid relationships of disease pairs based on the MMI network are more significant than those based on the protein-protein interaction (PPI) network. This study can be useful to elucidate the mechanisms underlying comorbidities for further study, which will provide a broader insight into the pathogenesis of diseases.
Collapse
Affiliation(s)
- Soyoun Hwang
- Department of IT Convergence Engineering, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Korea
| | - Taekeon Lee
- Department of Computer Engineering, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Korea
| | - Youngmi Yoon
- Department of Computer Engineering, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Korea
| |
Collapse
|
14
|
Ni P, Wang J, Zhong P, Li Y, Wu FX, Pan Y. Constructing Disease Similarity Networks Based on Disease Module Theory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:906-915. [PMID: 29993782 DOI: 10.1109/tcbb.2018.2817624] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Quantifying the associations between diseases is now playing an important role in modern biology and medicine. Actually discovering associations between diseases could help us gain deeper insights into pathogenic mechanisms of complex diseases, thus could lead to improvements in disease diagnosis, drug repositioning, and drug development. Due to the growing body of high-throughput biological data, a number of methods have been developed for computing similarity between diseases during the past decade. However, these methods rarely consider the interconnections of genes related to each disease in protein-protein interaction network (PPIN). Recently, the disease module theory has been proposed, which states that disease-related genes or proteins tend to interact with each other in the same neighborhood of a PPIN. In this study, we propose a new method called ModuleSim to measure associations between diseases by using disease-gene association data and PPIN data based on disease module theory. The experimental results show that by considering the interactions between disease modules and their modularity, the disease similarity calculated by ModuleSim has a significant correlation with disease classification of Disease Ontology (DO). Furthermore, ModuleSim outperforms other four popular methods which are all using disease-gene association data and PPIN data to measure disease-disease associations. In addition, the disease similarity network constructed by MoudleSim suggests that ModuleSim is capable of finding potential associations between diseases.
Collapse
|
15
|
Gaudelet T, Malod-Dognin N, Sánchez-Valle J, Pancaldi V, Valencia A, Pržulj N. Unveiling new disease, pathway, and gene associations via multi-scale neural network. PLoS One 2020; 15:e0231059. [PMID: 32251458 PMCID: PMC7135208 DOI: 10.1371/journal.pone.0231059] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 03/14/2020] [Indexed: 12/16/2022] Open
Abstract
Diseases involve complex modifications to the cellular machinery. The gene expression profile of the affected cells contains characteristic patterns linked to a disease. Hence, new biological knowledge about a disease can be extracted from these profiles, improving our ability to diagnose and assess disease risks. This knowledge can be used for drug re-purposing, or by physicians to evaluate a patient’s condition and co-morbidity risk. Here, we consider differential gene expressions obtained by microarray technology for patients diagnosed with various diseases. Based on these data and cellular multi-scale organization, we aim at uncovering disease–disease, disease–gene and disease–pathway associations. We propose a neural network with structure based on the multi-scale organization of proteins in a cell into biological pathways. We show that this model is able to correctly predict the diagnosis for the majority of patients. Through the analysis of the trained model, we predict disease–disease, disease–pathway, and disease–gene associations and validate the predictions by comparisons to known interactions and literature search, proposing putative explanations for the predictions.
Collapse
Affiliation(s)
- Thomas Gaudelet
- Department of Computer Science, University College London, London, United Kingdom
| | | | | | - Vera Pancaldi
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Centre de Recherches en Cancérologie de Toulouse (CRCT), UMR1037 Inserm, ERL5294 CNRS, 31037, Toulouse, France
- University Paul Sabatier III, Toulouse, France
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- ICREA, Pg. Lluis Companys, Barcelona, Spain
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, United Kingdom
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- ICREA, Pg. Lluis Companys, Barcelona, Spain
- * E-mail:
| |
Collapse
|
16
|
Akram P, Liao L. Prediction of comorbid diseases using weighted geometric embedding of human interactome. BMC Med Genomics 2019; 12:161. [PMID: 31888634 PMCID: PMC6936100 DOI: 10.1186/s12920-019-0605-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 10/16/2019] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Comorbidity is the phenomenon of two or more diseases occurring simultaneously not by random chance and presents great challenges to accurate diagnosis and treatment. As an effort toward better understanding the genetic causes of comorbidity, in this work, we have developed a computational method to predict comorbid diseases. Two diseases sharing common genes tend to increase their comorbidity. Previous work shows that after mapping the associated genes onto the human interactome the distance between the two disease modules (subgraphs) is correlated with comorbidity. METHODS To fully incorporate structural characteristics of interactome as features into prediction of comorbidity, our method embeds the human interactome into a high dimensional geometric space with weights assigned to the network edges and uses the projection onto different dimension to "fingerprint" disease modules. A supervised machine learning classifier is then trained to discriminate comorbid diseases versus non-comorbid diseases. RESULTS In cross-validation using a benchmark dataset of more than 10,000 disease pairs, we report that our model achieves remarkable performance of ROC score = 0.90 for comorbidity threshold at relative risk RR = 0 and 0.76 for comorbidity threshold at RR = 1, and significantly outperforms the previous method and the interactome generated by annotated data. To further incorporate prior knowledge pathways association with diseases, we weight the protein-protein interaction network edges according to their frequency of occurring in those pathways in such a way that edges with higher frequency will more likely be selected in the minimum spanning tree for geometric embedding. Such weighted embedding is shown to lead to further improvement of comorbid disease prediction. CONCLUSION The work demonstrates that embedding the two-dimension planar graph of human interactome into a high dimensional geometric space allows for characterizing and capturing disease modules (subgraphs formed by the disease associated genes) from multiple perspectives, and hence provides enriched features for a supervised classifier to discriminate comorbid disease pairs from non-comorbid disease pairs more accurately than based on simply the module separation.
Collapse
Affiliation(s)
- Pakeeza Akram
- School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan
- Department of Computer Science, University of Delaware, Newark, USA
| | - Li Liao
- Department of Computer Science, University of Delaware, Newark, USA
| |
Collapse
|
17
|
Guo M, Yu Y, Wen T, Zhang X, Liu B, Zhang J, Zhang R, Zhang Y, Zhou X. Analysis of disease comorbidity patterns in a large-scale China population. BMC Med Genomics 2019; 12:177. [PMID: 31829182 PMCID: PMC6907122 DOI: 10.1186/s12920-019-0629-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Disease comorbidity is popular and has significant indications for disease progress and management. We aim to detect the general disease comorbidity patterns in Chinese populations using a large-scale clinical data set. METHODS We extracted the diseases from a large-scale anonymized data set derived from 8,572,137 inpatients in 453 hospitals across China. We built a Disease Comorbidity Network (DCN) using correlation analysis and detected the topological patterns of disease comorbidity using both complex network and data mining methods. The comorbidity patterns were further validated by shared molecular mechanisms using disease-gene associations and pathways. To predict the disease occurrence during the whole disease progressions, we applied four machine learning methods to model the disease trajectories of patients. RESULTS We obtained the DCN with 5702 nodes and 258,535 edges, which shows a power law distribution of the degree and weight. It further indicated that there exists high heterogeneity of comorbidities for different diseases and we found that the DCN is a hierarchical modular network with community structures, which have both homogeneous and heterogeneous disease categories. Furthermore, adhering to the previous work from US and Europe populations, we found that the disease comorbidities have their shared underlying molecular mechanisms. Furthermore, take hypertension and psychiatric disease as instance, we used four classification methods to predicte the disease occurrence using the comorbid disease trajectories and obtained acceptable performance, in which in particular, random forest obtained an overall best performance (with F1-score 0.6689 for hypertension and 0.6802 for psychiatric disease). CONCLUSIONS Our study indicates that disease comorbidity is significant and valuable to understand the disease incidences and their interactions in real-world populations, which will provide important insights for detection of the patterns of disease classification, diagnosis and prognosis.
Collapse
Affiliation(s)
- Mengfei Guo
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044, China
| | - Yanan Yu
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044, China
| | - Tiancai Wen
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, China.,School of Computer Science, Northwestern Polytechnical University, Xi'an, 710129, Shanxi Province, China
| | - Xiaoping Zhang
- China Academy of Chinese Medicine Sciences, Beijing, 100070, China
| | - Baoyan Liu
- China Academy of Chinese Medicine Sciences, Beijing, 100070, China.
| | - Jin Zhang
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Runshun Zhang
- China Academy of Chinese Medical Sciences, Guang'anmen Hospital, Beijing, 100053, China
| | - Yanning Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710129, Shanxi Province, China.
| | - Xuezhong Zhou
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044, China.
| |
Collapse
|
18
|
Chen X, Shi W, Deng L. Prediction of Disease Comorbidity Using HeteSim Scores based on Multiple Heterogeneous Networks. Curr Gene Ther 2019; 19:232-241. [DOI: 10.2174/1566523219666190917155959] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/14/2019] [Accepted: 06/16/2019] [Indexed: 12/25/2022]
Abstract
Background:
Accumulating experimental studies have indicated that disease comorbidity
causes additional pain to patients and leads to the failure of standard treatments compared to patients
who have a single disease. Therefore, accurate prediction of potential comorbidity is essential to design
more efficient treatment strategies. However, only a few disease comorbidities have been discovered
in the clinic.
Objective:
In this work, we propose PCHS, an effective computational method for predicting disease
comorbidity.
Materials and Methods:
We utilized the HeteSim measure to calculate the relatedness score for different
disease pairs in the global heterogeneous network, which integrates six networks based on biological
information, including disease-disease associations, drug-drug interactions, protein-protein interactions
and associations among them. We built the prediction model using the Support Vector Machine
(SVM) based on the HeteSim scores.
Results and Conclusion:
The results showed that PCHS performed significantly better than previous
state-of-the-art approaches and achieved an AUC score of 0.90 in 10-fold cross-validation. Furthermore,
some of our predictions have been verified in literatures, indicating the effectiveness of our method.
Collapse
Affiliation(s)
- Xuegong Chen
- School of Computer Science and Engineering, Central South University, Changsha, 410075, China
| | - Wanwan Shi
- School of Computer Science and Engineering, Central South University, Changsha, 410075, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, 410075, China
| |
Collapse
|
19
|
Jhee JH, Bang S, Lee DG, Shin H. Comorbidity Scoring with Causal Disease Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1627-1634. [PMID: 29993606 DOI: 10.1109/tcbb.2018.2812886] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In recent years, there has been numerous studies constructing a disease network with diverse sources of data. Many researchers attempted to extend the usage of the disease network by employing machine learning algorithms on various problems such as prediction of comorbidity. The relations between diseases can further be specified into causal relations. When causality is laid on the edges in the network, prediction for comorbid diseases can be more improved. However, not many machine learning algorithms have been developed to concern causality. In this study, we exploit a network based machine learning algorithm that generates comorbidity scores from a causal disease network. In order to find comorbid diseases, semi-supervised scoring for causal networks is proposed. It computes scores of entire nodes in the network when a specific node is labeled. Each score is calculated one at a time and affects to the others along causal edges. The algorithm iterates until it converges. We compared the scoring results of the causal disease network and those of simple association network. As a gold standard, we referenced the values of relative risk from prevalence database, HuDiNe. Scoring by the proposed method provides clearer distinguishability between the top-ranked diseases in the comorbidity list. This is a benefit because it allows the choosing of the most significant ones on an easier fashion. To present typical use of the resulting list, comorbid diseases of Huntington disease and pnuemonia are validated via PubMed literature, respectively.
Collapse
|
20
|
Liu J, Cheng Y, Wang X, Cui X, Kong Y, Du J. Low Rank Subspace Clustering via Discrete Constraint and Hypergraph Regularization for Tumor Molecular Pattern Discovery. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1500-1512. [PMID: 29993749 DOI: 10.1109/tcbb.2018.2834371] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Tumor clustering is a powerful approach for cancer class discovery which is crucial to the effective treatment of cancer. Many traditional clustering methods such as NMF-based models, have been widely used to identify tumors. However, they cannot achieve satisfactory results. Recently, subspace clustering approaches have been proposed to improve the performance by dividing the original space into multiple low-dimensional subspaces. Among them, low rank representation is becoming a popular approach to attain subspace clustering. In this paper, we propose a novel Low Rank Subspace Clustering model via Discrete Constraint and Hypergraph Regularization (DHLRS). The proposed method learns the cluster indicators directly by using discrete constraint, which makes the clustering task simple. For each subspace, we adopt Schatten -norm to better approximate the low rank constraint. Moreover, Hypergraph Regularization is adopted to infer the complex relationship between genes and intrinsic geometrical structure of gene expression data in each subspace. Finally, the molecular pattern of tumor gene expression data sets is discovered according to the optimized cluster indicators. Experiments on both synthetic data and real tumor gene expression data sets prove the effectiveness of proposed DHLRS.
Collapse
|