1
|
Mishra S, Chinthala A, Bhattacharya M. Drug-target prediction through self supervised learning with dual task ensemble approach. Comput Biol Chem 2024; 113:108244. [PMID: 39454455 DOI: 10.1016/j.compbiolchem.2024.108244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 09/15/2024] [Accepted: 10/09/2024] [Indexed: 10/28/2024]
Abstract
Drug-Target interaction (DTI) prediction, a transformative approach in pharmaceutical research, seeks novel therapeutic applications for computational method based virtual screening, existing drugs to address untreated diseases and discovery of existing drugs side effects. The proposed model predict DTI through Heterogeneous biological network by combining drug, genes and disease related knowledge. For the purpose of embedding extraction Self-supervised learning (SSL) has been used which, trains models through pretext tasks, eliminating the need for manual annotations. The pretext tasks are related to either structural based information or similarity based information. To mitigate GNN vulnerability to non-robustness, ensemble learning can be incorporated into GNNs, harnessing multiple models to enhance robustness. This paper introduces a Graph neural network based architecture consisting of task based module and ensemble module for link prediction of DTI. The ensemble module of dual task combinations, both in cold start and warm start scenarios achieve very good performance as it provide 0.960 in cold start and 0.970 in warm start mean AUCROC score with less deviation.
Collapse
Affiliation(s)
- Surabhi Mishra
- ABV- Indian Institute of Information Technology and Management., Morena Road, Gwalior, 474015, India.
| | - Ashish Chinthala
- ABV- Indian Institute of Information Technology and Management., Morena Road, Gwalior, 474015, India.
| | - Mahua Bhattacharya
- ABV- Indian Institute of Information Technology and Management., Morena Road, Gwalior, 474015, India.
| |
Collapse
|
2
|
Hu X, Liu D, Zhang J, Fan Y, Ouyang T, Luo Y, Zhang Y, Deng L. A comprehensive review and evaluation of graph neural networks for non-coding RNA and complex disease associations. Brief Bioinform 2023; 24:bbad410. [PMID: 37985451 DOI: 10.1093/bib/bbad410] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 10/07/2023] [Accepted: 10/25/2023] [Indexed: 11/22/2023] Open
Abstract
Non-coding RNAs (ncRNAs) play a critical role in the occurrence and development of numerous human diseases. Consequently, studying the associations between ncRNAs and diseases has garnered significant attention from researchers in recent years. Various computational methods have been proposed to explore ncRNA-disease relationships, with Graph Neural Network (GNN) emerging as a state-of-the-art approach for ncRNA-disease association prediction. In this survey, we present a comprehensive review of GNN-based models for ncRNA-disease associations. Firstly, we provide a detailed introduction to ncRNAs and GNNs. Next, we delve into the motivations behind adopting GNNs for predicting ncRNA-disease associations, focusing on data structure, high-order connectivity in graphs and sparse supervision signals. Subsequently, we analyze the challenges associated with using GNNs in predicting ncRNA-disease associations, covering graph construction, feature propagation and aggregation, and model optimization. We then present a detailed summary and performance evaluation of existing GNN-based models in the context of ncRNA-disease associations. Lastly, we explore potential future research directions in this rapidly evolving field. This survey serves as a valuable resource for researchers interested in leveraging GNNs to uncover the complex relationships between ncRNAs and diseases.
Collapse
Affiliation(s)
- Xiaowen Hu
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Dayun Liu
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Jiaxuan Zhang
- Department of Electrical and Computer Engineering, University of California, San Diego,92093 CA, USA
| | - Yanhao Fan
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Tianxiang Ouyang
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Yue Luo
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Yuanpeng Zhang
- school of software, Xinjiang University, 830046 Urumqi, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| |
Collapse
|
3
|
Wang X, Cheng Y, Yang Y, Yu Y, Li F, Peng S. Multitask joint strategies of self-supervised representation learning on biomedical networks for drug discovery. NAT MACH INTELL 2023; 5:445-456. [DOI: 10.1038/s42256-023-00640-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 03/02/2023] [Indexed: 01/03/2025]
Abstract
AbstractSelf-supervised representation learning (SSL) on biomedical networks provides new opportunities for drug discovery; however, effectively combining multiple SSL models is still challenging and has been rarely explored. We therefore propose multitask joint strategies of SSL on biomedical networks for drug discovery, named MSSL2drug. We design six basic SSL tasks that are inspired by the knowledge of various modalities, inlcuding structures, semantics and attributes in heterogeneous biomedical networks. Importantly, fifteen combinations of multiple tasks are evaluated using a graph-attention-based multitask adversarial learning framework in two drug discovery scenarios. The results suggest two important findings: (1) combinations of multimodal tasks achieve better performance than other multitask joint models; (2) the local–global combination models yield higher performance than random two-task combinations when there are the same number of modalities. We thus conjecture that the multimodal and local–global combination strategies can be treated as the guideline of multitask SSL for drug discovery.
Collapse
|
4
|
Ünsal Ü, Cüvitoğlu A, Turhan K, Işık Z. NMSDR: Drug repurposing approach based on transcriptome data and network module similarity. Mol Inform 2023; 42:e2200077. [PMID: 36411244 DOI: 10.1002/minf.202200077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 09/19/2022] [Accepted: 11/21/2022] [Indexed: 11/23/2022]
Abstract
Computational drug repurposing aims to discover new treatment regimens by analyzing approved drugs on the market. This study proposes previously approved compounds that can change the expression profile of disease-causing proteins by developing a network theory-based drug repurposing approach. The novelty of the proposed approach is an exploration of module similarity between a disease-causing network and a compound-specific interaction network; thus, such an association leads to more realistic modeling of molecular cell responses at a system biology level. The overlap of the disease network and each compound-specific network is calculated based on a shortest-path similarity of networks by accounting for all protein pairs between networks. A higher similarity score indicates a significant potential of a compound. The approach was validated for breast and lung cancers. When all compounds are sorted by their normalized-similarity scores, 36 and 16 drugs are proposed as new candidates for breast and lung cancer treatment, respectively. A literature survey on candidate compounds revealed that some of our predictions have been clinically investigated in phase II/III trials for the treatment of two cancer types. As a summary, the proposed approach has provided promising initial results by modeling biochemical cell responses in a network-level data representation.
Collapse
Affiliation(s)
- Ülkü Ünsal
- Department of Biostatistics and Medical Informatics, Karadeniz Technical University, 61080, Trabzon, Türkiye.,Department of Health Management, Karadeniz Technical University, 61080, Trabzon, Türkiye
| | - Ali Cüvitoğlu
- Department of Computer Engineering, Dokuz Eylul University, 35390, İzmir, Türkiye
| | - Kemal Turhan
- Department of Biostatistics and Medical Informatics, Karadeniz Technical University, 61080, Trabzon, Türkiye
| | - Zerrin Işık
- Department of Computer Engineering, Dokuz Eylul University, 35390, İzmir, Türkiye
| |
Collapse
|
5
|
Chen L, Yu YN, Liu J, Chen YY, Wang B, Qi YF, Guan S, Liu X, Li B, Zhang YY, Hu Y, Wang Z. Modular networks and genomic variation during progression from stable angina pectoris through ischemic cardiomyopathy to chronic heart failure. Mol Med 2022; 28:140. [DOI: 10.1186/s10020-022-00569-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 11/04/2022] [Indexed: 11/28/2022] Open
Abstract
Abstract
Background
Analyzing disease–disease relationships plays an important role for understanding etiology, disease classification, and drug repositioning. However, as cardiovascular diseases with causative links, the molecular relationship among stable angina pectoris (SAP), ischemic cardiomyopathy (ICM) and chronic heart failure (CHF) is not clear.
Methods
In this study, by integrating the multi-database data, we constructed paired disease progression modules (PDPMs) to identified relationship among SAP, ICM and CHF based on module reconstruction pairs (MRPs) of K-value calculation (a Euclidean distance optimization by integrating module topology parameters and their weights) methods. Finally, enrichment analysis, literature validation and structural variation (SV) were performed to verify the relationship between the three diseases in PDPMs.
Results
Total 16 PDPMs were found with K > 0.3777 among SAP, ICM and CHF, in which 6 pairs in SAP–ICM, 5 pairs for both ICM–CHF and SAP–CHF. SAP–ICM was the most closely related by having the smallest average K-value (K = 0.3899) while the maximum is SAP–CHF (K = 0.4006). According to the function of the validation gene, inflammatory response were through each stage of SAP–ICM–CHF, while SAP–ICM was uniquely involved in fibrosis, and genes were related in affecting the upstream of PI3K–Akt signaling pathway. 4 of the 11 genes (FLT1, KDR, ANGPT2 and PGF) in SAP–ICM–CHF related to angiogenesis in HIF-1 signaling pathway. Furthermore, we identified 62.96% SVs were protein deletion in SAP–ICM–CHF, and 53.85% SVs were defined as protein replication in SAP–ICM, while ICM–CHF genes were mainly affected by protein deletion.
Conclusion
The PDPMs analysis approach combined with genomic structural variation provides a new avenue for determining target associations contributing to disease progression and reveals that inflammation and angiogenesis may be important links among SAP, ICM and CHF progression.
Collapse
|
6
|
Chen Y, Hu Y, Hu X, Feng C, Chen M. CoGO: a contrastive learning framework to predict disease similarity based on gene network and ontology structure. Bioinformatics 2022; 38:4380-4386. [PMID: 35900147 DOI: 10.1093/bioinformatics/btac520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 06/16/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Quantifying the similarity of human diseases provides guiding insights to the discovery of micro-scope mechanisms from a macro scale. Previous work demonstrated that better performance can be gained by integrating multiview data sources or applying machine learning techniques. However, designing an efficient framework to extract and incorporate information from different biological data using deep learning models remains unexplored. RESULTS We present CoGO, a Contrastive learning framework to predict disease similarity based on Gene network and Ontology structure, which incorporates the gene interaction network and gene ontology (GO) domain knowledge using graph deep learning models. First, graph deep learning models are applied to encode the features of genes and GO terms from separate graph structure data. Next, gene and GO features are projected to a common embedding space via a nonlinear projection. Then cross-view contrastive loss is applied to maximize the agreement of corresponding gene-GO associations and lead to meaningful gene representation. Finally, CoGO infers the similarity between diseases by the cosine similarity of disease representation vectors derived from related gene embedding. In our experiments, CoGO outperforms the most competitive baseline method on both AUROC and AUPRC, especially improves 19.57% in AUPRC (0.7733). The prediction results are significantly comparable with other disease similarity studies and thus highly credible. Furthermore, we conduct a detailed case study of top similar disease pairs which is demonstrated by other studies. Empirical results show that CoGO achieves powerful performance in disease similarity problem. AVAILABILITY AND IMPLEMENTATION https://github.com/yhchen1123/CoGO.
Collapse
Affiliation(s)
- Yuhao Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Yanshi Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Xiaotian Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Cong Feng
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China.,Biomedical Big Data Center, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China.,Institute of Hematology, Zhejiang University, Hangzhou, 310058, China
| |
Collapse
|
7
|
Kim HJ, Shin SY, Jeong SH. Nature and Extent of Physical Comorbidities Among Korean Patients With Mental Illnesses: Pairwise and Network Analysis Based on Health Insurance Claims Data. Psychiatry Investig 2022; 19:488-499. [PMID: 35753688 PMCID: PMC9233950 DOI: 10.30773/pi.2022.0068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 04/29/2022] [Indexed: 11/27/2022] Open
Abstract
OBJECTIVE The nature of physical comorbidities in patients with mental illness may differ according to diagnosis and personal characteristics. We investigated this complexity by conventional logistic regression and network analysis. METHODS A health insurance claims data in Korea was analyzed. For every combination of psychiatric and physical diagnoses, odds ratios were calculated adjusting age and sex. From the patient-diagnosis data, a network of diagnoses was constructed using Jaccard coefficient as the index of comorbidity. RESULTS In 1,017,024 individuals, 77,447 (7.6%) were diagnosed with mental illnesses. The number of physical diagnoses among them was 11.2, which was 1.6 times higher than non-psychiatric groups. The most noticeable associations were 1) neurotic illnesses with gastrointestinal/pain disorders and 2) dementia with fracture, Parkinson's disease, and cerebrovascular accidents. Unexpectedly, the diagnosis of metabolic syndrome was only scarcely found in patients with severe mental illnesses (SMIs). However, implicit associations between metabolic syndrome and SMIs were suggested in comorbidity networks. CONCLUSION Physical comorbidities in patients with mental illnesses were more extensive than those with other disease categories. However, the result raised questions as to whether the medical resources were being diverted to less serious conditions than more urgent conditions in patients with SMIs.
Collapse
Affiliation(s)
- Ho Joon Kim
- Department of Psychiatry, Daejeon Eulji Medical Center, Eulji University School of Medicine, Daejeon, Republic of Korea
| | - Sam Yi Shin
- Department of Psychiatry, The Healer's Hospital, Busan, Republic of Korea
| | - Seong Hoon Jeong
- Department of Psychiatry, Daejeon Eulji Medical Center, Eulji University School of Medicine, Daejeon, Republic of Korea
| |
Collapse
|
8
|
Zhang Y, Lei X, Pan Y, Wu FX. Drug Repositioning with GraphSAGE and Clustering Constraints Based on Drug and Disease Networks. Front Pharmacol 2022; 13:872785. [PMID: 35620297 PMCID: PMC9127467 DOI: 10.3389/fphar.2022.872785] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 04/11/2022] [Indexed: 11/29/2022] Open
Abstract
The understanding of therapeutic properties is important in drug repositioning and drug discovery. However, chemical or clinical trials are expensive and inefficient to characterize the therapeutic properties of drugs. Recently, artificial intelligence (AI)-assisted algorithms have received extensive attention for discovering the potential therapeutic properties of drugs and speeding up drug development. In this study, we propose a new method based on GraphSAGE and clustering constraints (DRGCC) to investigate the potential therapeutic properties of drugs for drug repositioning. First, the drug structure features and disease symptom features are extracted. Second, the drug–drug interaction network and disease similarity network are constructed according to the drug–gene and disease–gene relationships. Matrix factorization is adopted to extract the clustering features of networks. Then, all the features are fed to the GraphSAGE to predict new associations between existing drugs and diseases. Benchmark comparisons on two different datasets show that our method has reliable predictive performance and outperforms other six competing. We have also conducted case studies on existing drugs and diseases and aimed to predict drugs that may be effective for the novel coronavirus disease 2019 (COVID-19). Among the predicted anti-COVID-19 drug candidates, some drugs are being clinically studied by pharmacologists, and their binding sites to COVID-19-related protein receptors have been found via the molecular docking technology.
Collapse
Affiliation(s)
- Yuchen Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
9
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
10
|
Prieto Santamaría L, García Del Valle EP, Zanin M, Hernández Chan GS, Pérez Gallardo Y, Rodríguez-González A. Classifying diseases by using biological features to identify potential nosological models. Sci Rep 2021; 11:21096. [PMID: 34702888 PMCID: PMC8548311 DOI: 10.1038/s41598-021-00554-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 10/14/2021] [Indexed: 11/25/2022] Open
Abstract
Established nosological models have provided physicians an adequate enough classification of diseases so far. Such systems are important to correctly identify diseases and treat them successfully. However, these taxonomies tend to be based on phenotypical observations, lacking a molecular or biological foundation. Therefore, there is an urgent need to modernize them in order to include the heterogeneous information that is produced in the present, as could be genomic, proteomic, transcriptomic and metabolic data, leading this way to more comprehensive and robust structures. For that purpose, we have developed an extensive methodology to analyse the possibilities when it comes to generate new nosological models from biological features. Different datasets of diseases have been considered, and distinct features related to diseases, namely genes, proteins, metabolic pathways and genetical variants, have been represented as binary and numerical vectors. From those vectors, diseases distances have been computed on the basis of several metrics. Clustering algorithms have been implemented to group diseases, generating different models, each of them corresponding to the distinct combinations of the previous parameters. They have been evaluated by means of intrinsic metrics, proving that some of them are highly suitable to cover new nosologies. One of the clustering configurations has been deeply analysed, demonstrating its quality and validity in the research context, and further biological interpretations have been made. Such model was particularly generated by OPTICS clustering algorithm, by studying the distance between diseases based on gene sharedness and following cosine index metric. 729 clusters were formed in this model, which obtained a Silhouette coefficient of 0.43.
Collapse
Affiliation(s)
- Lucía Prieto Santamaría
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain. .,Ezeris Networks Global Services S.L., 28028, Madrid, Spain.
| | | | - Massimiliano Zanin
- Instituto de Física Interdisciplinar y Sistemas Complejos, CSIC-UIB, 07122, Palma de Mallorca, Spain
| | | | | | | |
Collapse
|
11
|
A New Framework for Discovering Protein Complex and Disease Association via Mining Multiple Databases. Interdiscip Sci 2021; 13:683-692. [PMID: 33905111 DOI: 10.1007/s12539-021-00432-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 03/31/2021] [Accepted: 04/09/2021] [Indexed: 10/21/2022]
Abstract
One important challenge in the post-genomic era is to explore disease mechanisms by efficiently integrating different types of biological data. In fact, a single disease is usually caused through multiple genes products such as protein complexes rather than single gene. Therefore, it is meaningful for us to discover protein communities from the protein-protein interaction network and use them for inferring disease-disease associations. In this article, we propose a new framework including protein-protein networks, disease-gene associations and disease-complex pairs to cluster protein complexes and infer disease associations. Complexes discovered by our approach is superior in quality (Sn, PPV and ACC) and clustering quantity than other four popular methods on three PPI networks. A systematic analysis shows that disease pairs sharing more protein complexes (such as Glucose and Lipid Metabolic Disorders) are more similar and overlapping proteins may have different roles in different diseases. These findings can provide clinical scholars and medical practitioners with new ideas on disease identification and treatment.
Collapse
|
12
|
Lei X, Mudiyanselage TB, Zhang Y, Bian C, Lan W, Yu N, Pan Y. A comprehensive survey on computational methods of non-coding RNA and disease association prediction. Brief Bioinform 2020; 22:6042241. [PMID: 33341893 DOI: 10.1093/bib/bbaa350] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/20/2020] [Accepted: 11/01/2020] [Indexed: 02/06/2023] Open
Abstract
The studies on relationships between non-coding RNAs and diseases are widely carried out in recent years. A large number of experimental methods and technologies of producing biological data have also been developed. However, due to their high labor cost and production time, nowadays, calculation-based methods, especially machine learning and deep learning methods, have received a lot of attention and been used commonly to solve these problems. From a computational point of view, this survey mainly introduces three common non-coding RNAs, i.e. miRNAs, lncRNAs and circRNAs, and the related computational methods for predicting their association with diseases. First, the mainstream databases of above three non-coding RNAs are introduced in detail. Then, we present several methods for RNA similarity and disease similarity calculations. Later, we investigate ncRNA-disease prediction methods in details and classify these methods into five types: network propagating, recommend system, matrix completion, machine learning and deep learning. Furthermore, we provide a summary of the applications of these five types of computational methods in predicting the associations between diseases and miRNAs, lncRNAs and circRNAs, respectively. Finally, the advantages and limitations of various methods are identified, and future researches and challenges are also discussed.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | | | - Yuchen Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Chen Bian
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Wei Lan
- School of Computer, Electronics and Information at Guangxi University, Nanning, China
| | - Ning Yu
- Department of Computing Sciences at the College at Brockport, State University of New York, Rochester, NY, USA
| | - Yi Pan
- Computer Science Department at Georgia State University, Atlanta, GA, USA
| |
Collapse
|
13
|
Dai W, Tang T, Dai Z, Shi D, Mo L, Zhang Y. Probing the Mechanism of Hepatotoxicity of Hexabromocyclododecanes through Toxicological Network Analysis. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2020; 54:15235-15245. [PMID: 33190479 DOI: 10.1021/acs.est.0c03998] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The prediction and mechanism analysis of hepatotoxicity of contaminants, because of their various phenotypes and complex mechanisms, is still a key problem in environmental research. We applied a toxicological network analysis method to predict the hepatotoxicity of three hexabromocyclododecane (HBCD) diastereoisomers (α-HBCD, β-HBCD, and γ-HBCD) and explore their potential mechanisms. First, we collected the hepatotoxicity related genes and found that those genes were significantly localized in the human interactome. Therefore, these genes form a disease module of hepatotoxicity. We also collected targets of α-, β-, and γ-HBCD and found that their targets overlap with the hepatotoxicity disease module. Then, we trained a model to predict hepatotoxicity of three HBCD diastereoisomers based on the relationship between the hepatotoxicity disease module and targets of compounds. We found that 593 genes were significantly located in the hepatotoxicity disease module (Z = 11.9, p < 0.001) involved in oxidative stress, cellular immunity, and proliferation, and the accuracy of hepatotoxicity prediction of HBCD was 0.7095 ± 0.0193 and the recall score was 0.8355 ± 0.0352. HBCD mainly affects the core disease module genes to mediate the adenosine monophosphate-activated kinase, p38MAPK, PI3K/Akt, and TNFα pathways to regulate the immune reaction and inflammation. HBCD also induces the secretion of IL6 and STAT3 to lead hepatotoxicity by regulating NR3C1. This approach is transferable to other toxicity research studies of environmental pollutants.
Collapse
Affiliation(s)
- Weina Dai
- Chongqing Research Center for Pharmaceutical Engineering, College of Pharmacy, Chongqing Medical University, Chongqing 400016, China
| | - Tiantian Tang
- Chongqing Research Center for Pharmaceutical Engineering, College of Pharmacy, Chongqing Medical University, Chongqing 400016, China
| | - Zhenghua Dai
- Chongqing Research Center for Pharmaceutical Engineering, College of Pharmacy, Chongqing Medical University, Chongqing 400016, China
- Chongqing Academy of Metrology and Quality Inspection, Chongqing 401123, China
| | - Da Shi
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Lingyun Mo
- The Guangxi Key Laboratory of Theory and Technology for Environmental Pollution Control, College of Environmental Science and Engineering, Guilin University of Technology, Guilin 541004, China
- Technical Innovation Center for Mine Geological Environment Restoration Engineering in Shishan Area of South China, Ministry of Natural Resources, Nanning 530028, China
| | - Yonghong Zhang
- Chongqing Research Center for Pharmaceutical Engineering, College of Pharmacy, Chongqing Medical University, Chongqing 400016, China
| |
Collapse
|
14
|
Zhang J, Zhang Y, Li Y, Guo S, Yang G. Identification of Cancer Biomarkers in Human Body Fluids by Using Enhanced Physicochemical-incorporated Evolutionary Conservation Scheme. Curr Top Med Chem 2020; 20:1888-1897. [PMID: 32648847 DOI: 10.2174/1568026620666200710100743] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 03/01/2020] [Accepted: 03/02/2020] [Indexed: 02/07/2023]
Abstract
OBJECTIVE Cancer is one of the most serious diseases affecting human health. Among all current cancer treatments, early diagnosis and control significantly help increase the chances of cure. Detecting cancer biomarkers in body fluids now is attracting more attention within oncologists. In-silico predictions of body fluid-related proteins, which can be served as cancer biomarkers, open a door for labor-intensive and time-consuming biochemical experiments. METHODS In this work, we propose a novel method for high-throughput identification of cancer biomarkers in human body fluids. We incorporate physicochemical properties into the weighted observed percentages (WOP) and position-specific scoring matrices (PSSM) profiles to enhance their attributes that reflect the evolutionary conservation of the body fluid-related proteins. The least absolute selection and shrinkage operator (LASSO) feature selection strategy is introduced to generate the optimal feature subset. RESULTS The ten-fold cross-validation results on training datasets demonstrate the accuracy of the proposed model. We also test our proposed method on independent testing datasets and apply it to the identification of potential cancer biomarkers in human body fluids. CONCLUSION The testing results promise a good generalization capability of our approach.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Yu Zhang
- Information Engineering College, Huanghuai University, Zhumadian, China
| | - Yanlin Li
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Song Guo
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Guifu Yang
- College of Information Science and Technology, Northeast Normal University, Changchun, China
| |
Collapse
|
15
|
Gao J, Tian L, Wang J, Chen Y, Song B, Hu X. Similar Disease Prediction With Heterogeneous Disease Information Networks. IEEE Trans Nanobioscience 2020; 19:571-578. [PMID: 32603299 DOI: 10.1109/tnb.2020.2994983] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Studying the similarity of diseases can help us to explore the pathological characteristics of complex diseases, and help provide reliable reference information for inferring the relationship between new diseases and known diseases, so as to develop effective treatment plans. To obtain the similarity of the disease, most previous methods either use a single similarity metric such as semantic score, functional score from single data source, or utilize weighting coefficients to simply combine multiple metrics with different dimensions. In this paper, we proposes a method to predict the similarity of diseases by node representation learning. We first integrate the semantic score and topological score between diseases by combining multiple data sources. Then for each disease, its integrated scores with all other diseases are utilized to map it into a vector of the same spatial dimension, and the vectors are used to measure and comprehensively analyze the similarity between diseases. Lastly, we conduct comparative experiment based on benchmark set and other disease nodes outside the benchmark set. Using the statistics such as average, variance, and coefficient of variation in the benchmark set to evaluate multiple methods demonstrates the effectiveness of our approach in the prediction of similar diseases.
Collapse
|
16
|
Luo H, Wang J, Li M, Luo J, Ni P, Zhao K, Wu FX, Pan Y. Computational Drug Repositioning with Random Walk on a Heterogeneous Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1890-1900. [PMID: 29994051 DOI: 10.1109/tcbb.2018.2832078] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Drug repositioning is an efficient and promising strategy to identify new indications for existing drugs, which can improve the productivity of traditional drug discovery and development. Rapid advances in high-throughput technologies have generated various types of biomedical data over the past decades, which lay the foundations for furthering the development of computational drug repositioning approaches. Although many researches have tried to improve the repositioning accuracy by integrating information from multiple sources and different levels, it is still appealing to further investigate how to efficiently exploit valuable data for drug repositioning. In this study, we propose an efficient approach, Random Walk on a Heterogeneous Network for Drug Repositioning (RWHNDR), to prioritize candidate drugs for diseases. First, an integrated heterogeneous network is constructed by combining multiple sources including drugs, drug targets, diseases and disease genes data. Then, a random walk model is developed to capture the global information of the heterogeneous network. RWHNDR takes advantage of drug targets and disease genes data more comprehensively for drug repositioning. The experiment results show that our approach can achieve better performance, compared with other state-of-the-art approaches which prioritized candidate drugs based on multi-source data.
Collapse
|
17
|
Su S, Zhang L, Liu J. An Effective Method to Measure Disease Similarity Using Gene and Phenotype Associations. Front Genet 2019; 10:466. [PMID: 31164903 PMCID: PMC6536643 DOI: 10.3389/fgene.2019.00466] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 04/30/2019] [Indexed: 12/12/2022] Open
Abstract
Motivation: In order to create controlled vocabularies for shared use in different biomedical domains, a large number of biomedical ontologies such as Disease Ontology (DO) and Human Phenotype Ontology (HPO), etc., are created in the bioinformatics community. Quantitative measures of the associations among diseases could help researchers gain a deep insight of human diseases, since similar diseases are usually caused by similar molecular origins or have similar phenotypes, which is beneficial to reveal the common attributes of diseases and improve the corresponding diagnoses and treatment plans. Some previous are proposed to measure the disease similarity using a particular biomedical ontology during the past few years, but for a newly discovered disease or a disease with few related genetic information in Disease Ontology (i.e., a disease with less disease-gene associations), these previous approaches usually ignores the joint computation of disease similarity by integrating gene and phenotype associations. Results: In this paper we propose a novel method called GPSim to effectively deduce the semantic similarity of diseases. In particular, GPSim calculates the similarity by jointly utilizing gene, disease and phenotype associations extracted from multiple biomedical ontologies and databases. We also explore the phenotypic factors such as the depth of HPO terms and the number of phenotypic associations that affect the evaluation performance. A final experimental evaluation is carried out to evaluate the performance of GPSim and shows its advantages over previous approaches.
Collapse
Affiliation(s)
- Shuhui Su
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Lei Zhang
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Jian Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
18
|
Wei H, Liu B. iCircDA-MF: identification of circRNA-disease associations based on matrix factorization. Brief Bioinform 2019; 21:1356-1367. [DOI: 10.1093/bib/bbz057] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 03/13/2019] [Accepted: 04/17/2019] [Indexed: 12/19/2022] Open
Abstract
Abstract
Circular RNAs (circRNAs) are a group of novel discovered non-coding RNAs with closed-loop structure, which play critical roles in various biological processes. Identifying associations between circRNAs and diseases is critical for exploring the complex disease mechanism and facilitating disease-targeted therapy. Although several computational predictors have been proposed, their performance is still limited. In this study, a novel computational method called iCircDA-MF is proposed. Because the circRNA-disease associations with experimental validation are very limited, the potential circRNA-disease associations are calculated based on the circRNA similarity and disease similarity extracted from the disease semantic information and the known associations of circRNA-gene, gene-disease and circRNA-disease. The circRNA-disease interaction profiles are then updated by the neighbour interaction profiles so as to correct the false negative associations. Finally, the matrix factorization is performed on the updated circRNA-disease interaction profiles to predict the circRNA-disease associations. The experimental results on a widely used benchmark dataset showed that iCircDA-MF outperforms other state-of-the-art predictors and can identify new circRNA-disease associations effectively.
Collapse
Affiliation(s)
- Hang Wei
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
19
|
Luo P, Li Y, Tian LP, Wu FX. Enhancing the prediction of disease–gene associations with multimodal deep learning. Bioinformatics 2019; 35:3735-3742. [DOI: 10.1093/bioinformatics/btz155] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 02/11/2019] [Accepted: 02/27/2019] [Indexed: 12/20/2022] Open
Abstract
Abstract
Motivation
Computationally predicting disease genes helps scientists optimize the in-depth experimental validation and accelerates the identification of real disease-associated genes. Modern high-throughput technologies have generated a vast amount of omics data, and integrating them is expected to improve the accuracy of computational prediction. As an integrative model, multimodal deep belief net (DBN) can capture cross-modality features from heterogeneous datasets to model a complex system. Studies have shown its power in image classification and tumor subtype prediction. However, multimodal DBN has not been used in predicting disease–gene associations.
Results
In this study, we propose a method to predict disease–gene associations by multimodal DBN (dgMDL). Specifically, latent representations of protein-protein interaction networks and gene ontology terms are first learned by two DBNs independently. Then, a joint DBN is used to learn cross-modality representations from the two sub-models by taking the concatenation of their obtained latent representations as the multimodal input. Finally, disease–gene associations are predicted with the learned cross-modality representations. The proposed method is compared with two state-of-the-art algorithms in terms of 5-fold cross-validation on a set of curated disease–gene associations. dgMDL achieves an AUC of 0.969 which is superior to the competing algorithms. Further analysis of the top-10 unknown disease–gene pairs also demonstrates the ability of dgMDL in predicting new disease–gene associations.
Availability and implementation
Prediction results and a reference implementation of dgMDL in Python is available on https://github.com/luoping1004/dgMDL.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ping Luo
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, Canada
| | - Yuanyuan Li
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, Canada
- School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan, China
| | - Li-Ping Tian
- School of Information, Beijing Wuzi University, Beijing, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, Canada
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, Canada
- Department of Computer Science, University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
20
|
Automatic ICD code assignment of Chinese clinical notes based on multilayer attention BiRNN. J Biomed Inform 2019; 91:103114. [DOI: 10.1016/j.jbi.2019.103114] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
21
|
Abstract
BACKGROUND Many evidences have demonstrated that circRNAs (circular RNA) play important roles in controlling gene expression of human, mouse and nematode. More importantly, circRNAs are also involved in many diseases through fine tuning of post-transcriptional gene expression by sequestering the miRNAs which associate with diseases. Therefore, identifying the circRNA-disease associations is very appealing to comprehensively understand the mechanism, treatment and diagnose of diseases, yet challenging. As the complex mechanism between circRNAs and diseases, wet-lab experiments are expensive and time-consuming to discover novel circRNA-disease associations. Therefore, it is of dire need to employ the computational methods to discover novel circRNA-disease associations. RESULT In this study, we develop a method (DWNN-RLS) to predict circRNA-disease associations based on Regularized Least Squares of Kronecker product kernel. The similarity of circRNAs is computed from the Gaussian Interaction Profile(GIP) based on known circRNA-disease associations. In addition, the similarity of diseases is integrated by the mean of GIP similarity and sematic similarity which is computed by the direct acyclic graph (DAG) representation of diseases. The kernels of circRNA-disease pairs are constructed from the Kronecker product of the kernels of circRNAs and diseases. DWNN (decreasing weight k-nearest neighbor) method is adopted to calculate the initial relational score for new circRNAs and diseases. The Kronecker product kernel based regularised least squares approach is used to predict new circRNA-disease associations. We adopt 5-fold cross validation (5CV), 10-fold cross validation (10CV) and leave one out cross validation (LOOCV) to assess the prediction performance of our method, and compare it with other six competing methods (RLS-avg, RLS-Kron, NetLapRLS, KATZ, NBI, WP). CONLUSION The experiment results show that DWNN-RLS reaches the AUC values of 0.8854, 0.9205 and 0.9701 in 5CV, 10CV and LOOCV, respectively, which illustrates that DWNN-RLS is superior to the competing methods RLS-avg, RLS-Kron, NetLapRLS, KATZ, NBI, WP. In addition, case studies also show that DWNN-RLS is an effective method to predict new circRNA-disease associations.
Collapse
Affiliation(s)
- Cheng Yan
- School of Information Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083 China
- School of Computer and Information,Qiannan Normal University for Nationalities, Longshan Road, DuYun, 558000 China
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083 China
| | - Fang-Xiang Wu
- Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9 Canada
| |
Collapse
|
22
|
Liu L, Yu Y, Fei Z, Li M, Wu FX, Li HD, Pan Y, Wang J. An interpretable boosting model to predict side effects of analgesics for osteoarthritis. BMC SYSTEMS BIOLOGY 2018; 12:105. [PMID: 30463545 PMCID: PMC6249730 DOI: 10.1186/s12918-018-0624-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
BACKGROUND Osteoarthritis (OA) is the most common disease of arthritis. Analgesics are widely used in the treat of arthritis, which may increase the risk of cardiovascular diseases by 20% to 50% overall.There are few studies on the side effects of OA medication, especially the risk prediction models on side effects of analgesics. In addition, most prediction models do not provide clinically useful interpretable rules to explain the reasoning process behind their predictions. In order to assist OA patients, we use the eXtreme Gradient Boosting (XGBoost) method to balance the accuracy and interpretability of the prediction model. RESULTS In this study we used the XGBoost model as a classifier, which is a supervised machine learning method and can predict side effects of analgesics for OA patients and identify high-risk features (RFs) of cardiovascular diseases caused by analgesics. The Electronic Medical Records (EMRs), which were derived from public knee OA studies, were used to train the model. The performance of the XGBoost model is superior to four well-known machine learning algorithms and identifies the risk features from the biomedical literature. In addition the model can provide decision support for using analgesics in OA patients. CONCLUSION Compared with other machine learning methods, we used XGBoost method to predict side effects of analgesics for OA patients from EMRs, and selected the individual informative RFs. The model has good predictability and interpretability, this is valuable for both medical researchers and patients.
Collapse
Affiliation(s)
- Liangliang Liu
- School of Information Science and Engineering, Central South University, Changsha, China
- Department of Network Center, Pingdingshan University, Pingdingshan, 467000 China
| | - Ying Yu
- School of Information Science and Engineering, Central South University, Changsha, China
| | - Zhihui Fei
- School of Information Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Information Science and Engineering, Central South University, Changsha, China
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9 Canada
| | - Hong-Dong Li
- School of Information Science and Engineering, Central South University, Changsha, China
| | - Yi Pan
- Department of Computer Science,Georgia State University, Atlanta, GA30302 USA
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha, China
| |
Collapse
|