1
|
Sánchez-Valle J, Valencia A. Molecular bases of comorbidities: present and future perspectives. Trends Genet 2023; 39:773-786. [PMID: 37482451 DOI: 10.1016/j.tig.2023.06.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 06/12/2023] [Accepted: 06/12/2023] [Indexed: 07/25/2023]
Abstract
Co-occurrence of diseases decreases patient quality of life, complicates treatment choices, and increases mortality. Analyses of electronic health records present a complex scenario of comorbidity relationships that vary by age, sex, and cohort under study. The study of similarities between diseases using 'omics data, such as genes altered in diseases, gene expression, proteome, and microbiome, are fundamental to uncovering the origin of, and potential treatment for, comorbidities. Recent studies have produced a first generation of genetic interpretations for as much as 46% of the comorbidities described in large cohorts. Integrating different sources of molecular information and using artificial intelligence (AI) methods are promising approaches for the study of comorbidities. They may help to improve the treatment of comorbidities, including the potential repositioning of drugs.
Collapse
Affiliation(s)
- Jon Sánchez-Valle
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona, 08034, Spain.
| | - Alfonso Valencia
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona, 08034, Spain; ICREA, Barcelona, 08010, Spain.
| |
Collapse
|
2
|
Fu M, Yan Y, Olde Loohuis LM, Chang TS. Defining the distance between diseases using SNOMED CT embeddings. J Biomed Inform 2023; 139:104307. [PMID: 36738869 DOI: 10.1016/j.jbi.2023.104307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 12/10/2022] [Accepted: 01/29/2023] [Indexed: 02/05/2023]
Abstract
Characterizing disease relationships is essential to biomedical research to understand disease etiology and improve clinical decision-making. Measurements of distance between disease pairs enable valuable research tasks, such as subgrouping patients and identifying common time courses of disease onset. Distance metrics developed in prior work focused on smaller, targeted disease sets. Distance metrics covering all diseases have not yet been defined, which limits the applications to a broader disease spectrum. Our current study defines disease distances for all disease pairs within the International Classification of Diseases, version 10 (ICD-10), the diagnostic classification system universally used in electronic health records. Our proposed distance is computed based on a biomedical ontology, SNOMED CT (Systemized Nomenclature of Medicine, Clinical Terms), which can also be viewed as a structured knowledge graph. We compared the knowledge graph-based metric to three other distance metrics based on the hierarchical structure of ICD, clinical comorbidity, and genetic correlation, to evaluate how each may capture similar or unique aspects of disease relationships. We show that our knowledge graph-based distance metric captures known phenotypic, clinical, and molecular characteristics at a finer granularity than the other three. With the continued growth of using electronic health records data for research, we believe that our distance metric will play an important role in subgrouping patients for precision health, and enabling individualized disease prevention and treatments.
Collapse
Affiliation(s)
- Mingzhou Fu
- Movement Disorders Program, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA; Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, CA, USA
| | - Yu Yan
- Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, CA, USA
| | - Loes M Olde Loohuis
- Center for Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
| | - Timothy S Chang
- Movement Disorders Program, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA.
| |
Collapse
|
3
|
Yang X, Xu W, Leng D, Wen Y, Wu L, Li R, Huang J, Bo X, He S. Exploring novel disease-disease associations based on multi-view fusion network. Comput Struct Biotechnol J 2023; 21:1807-1819. [PMID: 36923471 PMCID: PMC10009443 DOI: 10.1016/j.csbj.2023.02.038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 02/02/2023] [Accepted: 02/22/2023] [Indexed: 03/06/2023] Open
Abstract
UNLABELLED Established taxonomy system based on disease symptom and tissue characteristics have provided an important basis for physicians to correctly identify diseases and treat them successfully. However, these classifications tend to be based on phenotypic observations, lacking a molecular biological foundation. Therefore, there is an urgent to integrate multi-dimensional molecular biological information or multi-omics data to redefine disease classification in order to provide a powerful perspective for understanding the molecular structure of diseases. Therefore, we offer a flexible disease classification that integrates the biological process, gene expression, and symptom phenotype of diseases, and propose a disease-disease association network based on multi-view fusion. We applied the fusion approach to 223 diseases and divided them into 24 disease clusters. The contribution of internal and external edges of disease clusters were analyzed. The results of the fusion model were compared with Medical Subject Headings, a traditional and commonly used disease taxonomy. Then, experimental results of model performance comparison show that our approach performs better than other integration methods. As it was observed, the obtained clusters provided more interesting and novel disease-disease associations. This multi-view human disease association network describes relationships between diseases based on multiple molecular levels, thus breaking through the limitation of the disease classification system based on tissues and organs. This approach which motivates clinicians and researchers to reposition the understanding of diseases and explore diagnosis and therapy strategies, extends the existing disease taxonomy. AVAILABILITY OF DATA AND MATERIALS The preprocessed dataset and source code supporting the conclusions of this article are available at GitHub repository https://github.com/yangxiaoxi89/mvHDN.
Collapse
Affiliation(s)
- Xiaoxi Yang
- Clinical Medicine Institute, Beijing Friendship Hospital, Capital Medical University, Beijing 100050, China
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Wenjian Xu
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
- Rare Disease Center, Beijing Children’s Hospital, Capital Medical University, National Center for Children’s Health, Beijing 100045, China
- MOE Key Laboratory of Major Diseases in Children, Beijing 100045, China
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute, Beijing 100045, China
| | - Dongjin Leng
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Lianlian Wu
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Ruijiang Li
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Jian Huang
- Clinical Medicine Institute, Beijing Friendship Hospital, Capital Medical University, Beijing 100050, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| |
Collapse
|
4
|
Chen Y, Hu Y, Hu X, Feng C, Chen M. CoGO: a contrastive learning framework to predict disease similarity based on gene network and ontology structure. Bioinformatics 2022; 38:4380-4386. [PMID: 35900147 DOI: 10.1093/bioinformatics/btac520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 06/16/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Quantifying the similarity of human diseases provides guiding insights to the discovery of micro-scope mechanisms from a macro scale. Previous work demonstrated that better performance can be gained by integrating multiview data sources or applying machine learning techniques. However, designing an efficient framework to extract and incorporate information from different biological data using deep learning models remains unexplored. RESULTS We present CoGO, a Contrastive learning framework to predict disease similarity based on Gene network and Ontology structure, which incorporates the gene interaction network and gene ontology (GO) domain knowledge using graph deep learning models. First, graph deep learning models are applied to encode the features of genes and GO terms from separate graph structure data. Next, gene and GO features are projected to a common embedding space via a nonlinear projection. Then cross-view contrastive loss is applied to maximize the agreement of corresponding gene-GO associations and lead to meaningful gene representation. Finally, CoGO infers the similarity between diseases by the cosine similarity of disease representation vectors derived from related gene embedding. In our experiments, CoGO outperforms the most competitive baseline method on both AUROC and AUPRC, especially improves 19.57% in AUPRC (0.7733). The prediction results are significantly comparable with other disease similarity studies and thus highly credible. Furthermore, we conduct a detailed case study of top similar disease pairs which is demonstrated by other studies. Empirical results show that CoGO achieves powerful performance in disease similarity problem. AVAILABILITY AND IMPLEMENTATION https://github.com/yhchen1123/CoGO.
Collapse
Affiliation(s)
- Yuhao Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Yanshi Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Xiaotian Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Cong Feng
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China.,Biomedical Big Data Center, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China.,Institute of Hematology, Zhejiang University, Hangzhou, 310058, China
| |
Collapse
|
5
|
Dong G, Zhang ZC, Feng J, Zhao XM. MorbidGCN: prediction of multimorbidity with a graph convolutional network based on integration of population phenotypes and disease network. Brief Bioinform 2022; 23:6627601. [PMID: 35780382 DOI: 10.1093/bib/bbac255] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/17/2022] [Accepted: 06/01/2022] [Indexed: 02/06/2023] Open
Abstract
Exploring multimorbidity relationships among diseases is of great importance for understanding their shared mechanisms, precise diagnosis and treatment. However, the landscape of multimorbidities is still far from complete due to the complex nature of multimorbidity. Although various types of biological data, such as biomolecules and clinical symptoms, have been used to identify multimorbidities, the population phenotype information (e.g. physical activity and diet) remains less explored for multimorbidity. Here, we present a graph convolutional network (GCN) model, named MorbidGCN, for multimorbidity prediction by integrating population phenotypes and disease network. Specifically, MorbidGCN treats the multimorbidity prediction as a missing link prediction problem in the disease network, where a novel feature selection method is embedded to select important phenotypes. Benchmarking results on two large-scale multimorbidity data sets, i.e. the UK Biobank (UKB) and Human Disease Network (HuDiNe) data sets, demonstrate that MorbidGCN outperforms other competitive methods. With MorbidGCN, 9742 and 14 010 novel multimorbidities are identified in the UKB and HuDiNe data sets, respectively. Moreover, we notice that the selected phenotypes that are generally differentially distributed between multimorbidity patients and single-disease patients can help interpret multimorbidities and show potential for prognosis of multimorbidities.
Collapse
Affiliation(s)
- Guiying Dong
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Zi-Chao Zhang
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Jianfeng Feng
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China.,Zhangjiang Fudan International Innovation Center, Shanghai, 200433, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China.,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China.,Zhangjiang Fudan International Innovation Center, Shanghai, 200433, China
| |
Collapse
|
6
|
Allahgholi M, Rahmani H, Javdani D, Sadeghi-Adl Z, Bender A, Módos D, Weiss G. DDREL: From drug-drug relationships to drug repurposing. INTELL DATA ANAL 2022. [DOI: 10.3233/ida-215745] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Analyzing the relationships among various drugs is an essential issue in the field of computational biology. Different kinds of informative knowledge, such as drug repurposing, can be extracted from drug-drug relationships. Scientific literature represents a rich source for the retrieval of knowledge about the relationships between biological concepts, mainly drug-drug, disease-disease, and drug-disease relationships. In this paper, we propose DDREL as a general-purpose method that applies deep learning on scientific literature to automatically extract the graph of syntactic and semantic relationships among drugs. DDREL remarkably outperforms the existing human drug network method and a random network respected to average similarities of drugs’ anatomical therapeutic chemical (ATC) codes. DDREL is able to shed light on the existing deficiency of the ATC codes in various drug groups. From the DDREL graph, the history of drug discovery became visible. In addition, drugs that had repurposing score 1 (diflunisal, pargyline, fenofibrate, guanfacine, chlorzoxazone, doxazosin, oxymetholone, azathioprine, drotaverine, demecarium, omifensine, yohimbine) were already used in additional indication. The proposed DDREL method justifies the predictive power of textual data in PubMed abstracts. DDREL shows that such data can be used to 1- Predict repurposing drugs with high accuracy, and 2- Reveal existing deficiencies of the ATC codes in various drug groups.
Collapse
Affiliation(s)
- Milad Allahgholi
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Hossein Rahmani
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Delaram Javdani
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Zahra Sadeghi-Adl
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Dezsö Módos
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK
- Earlham Institute, Norwich Research Park, Norwich, Norfolk, UK
| | - Gerhard Weiss
- Department of Data Science and Knowledge Engineering (DKE), Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
7
|
Li Y, Wang K, Wang G. Evaluating Disease Similarity Based on Gene Network Reconstruction and Representation. Bioinformatics 2021; 37:3579-3587. [PMID: 33978702 DOI: 10.1093/bioinformatics/btab252] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 03/01/2021] [Accepted: 04/28/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Quantifying the associations between diseases is of great significance in increasing our understanding of disease biology, improving disease diagnosis, re-positioning, and developing drugs. Therefore, in recent years, the research of disease similarity has received a lot of attention in the field of bioinformatics. Previous work has shown that the combination of the ontology (such as disease ontology and gene ontology) and disease-gene interactions are worthy to be regarded to elucidate diseases and disease associations. However, most of them are either based on the overlap between disease-related gene sets or distance within the ontology's hierarchy. The diseases in these methods are represented by discrete or sparse feature vectors, which cannot grasp the deep semantic information of diseases. Recently, deep representation learning has been widely studied and gradually applied to various fields of bioinformatics. Based on the hypothesis that disease representation depends on its related gene representations, we propose a disease representation model using two most representative gene resources HumanNet and Gene Ontology to construct a new gene network and learn gene (disease) representations. The similarity between two diseases is computed by the cosine similarity of their corresponding representations. RESULTS We propose a novel approach to compute disease similarity, which integrates two important factors disease-related genes and gene ontology hierarchy to learn disease representation based on deep representation learning. Under the same experimental settings, the AUC value of our method is 0.8074, which improves the most competitive baseline method by 10.1%. The quantitative and qualitative experimental results show that our model can learn effective disease representations and improve the accuracy of disease similarity computation significantly. AVAILABILITY The research shows that this method has certain applicability in the prediction of gene-related diseases, the migration of disease treatment methods, drug development, and so on. SUPPLEMENTARY INFORMATION Supplementary data are available at https://github.com/catly/disease_similarity.
Collapse
Affiliation(s)
- Yang Li
- College of information and Computer Engineering, Northeast Forestry University, Harbin, 150004, China
| | - Keqi Wang
- College of information and Computer Engineering, Northeast Forestry University, Harbin, 150004, China
| | - Guohua Wang
- College of information and Computer Engineering, Northeast Forestry University, Harbin, 150004, China
| |
Collapse
|
8
|
Chen H, Zhang Z, Zhang J. In silico drug repositioning based on the integration of chemical, genomic and pharmacological spaces. BMC Bioinformatics 2021; 22:52. [PMID: 33557749 PMCID: PMC7868667 DOI: 10.1186/s12859-021-03988-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 01/25/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Drug repositioning refers to the identification of new indications for existing drugs. Drug-based inference methods for drug repositioning apply some unique features of drugs for new indication prediction. Complementary information is provided by these different features. It is therefore necessary to integrate these features for more accurate in silico drug repositioning. RESULTS In this study, we collect 3 different types of drug features (i.e., chemical, genomic and pharmacological spaces) from public databases. Similarities between drugs are separately calculated based on each of the features. We further develop a fusion method to combine the 3 similarity measurements. We test the inference abilities of the 4 similarity datasets in drug repositioning under the guilt-by-association principle. Leave-one-out cross-validations show the integrated similarity measurement IntegratedSim receives the best prediction performance, with the highest AUC value of 0.8451 and the highest AUPR value of 0.2201. Case studies demonstrate IntegratedSim produces the largest numbers of confirmed predictions in most cases. Moreover, we compare our integration method with 3 other similarity-fusion methods using the datasets in our study. Cross-validation results suggest our method improves the prediction accuracy in terms of AUC and AUPR values. CONCLUSIONS Our study suggests that the 3 drug features used in our manuscript are valuable information for drug repositioning. The comparative results indicate that integration of the 3 drug features would improve drug-disease association prediction. Our study provides a strategy for the fusion of different drug features for in silico drug repositioning.
Collapse
Affiliation(s)
- Hailin Chen
- School of Software, East China Jiaotong University, Nanchang, 330013 China
| | - Zuping Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083 China
| | - Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan, 467000 China
| |
Collapse
|
9
|
Chen S, Ghandikota S, Gautam Y, Mersha TB. AllergyGenDB: A literature and functional annotation-based omics database for allergic diseases. Allergy 2020; 75:1789-1793. [PMID: 32034783 DOI: 10.1111/all.14219] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Revised: 01/16/2020] [Accepted: 02/04/2020] [Indexed: 11/30/2022]
Affiliation(s)
- Siqi Chen
- Division of Asthma Research Cincinnati Children's Hospital Medical Center Cincinnati OH USA
- Department of Electrical Engineering and Computer Science University of Cincinnati Cincinnati OH USA
| | - Sudhir Ghandikota
- Division of Asthma Research Cincinnati Children's Hospital Medical Center Cincinnati OH USA
- Department of Electrical Engineering and Computer Science University of Cincinnati Cincinnati OH USA
| | - Yadu Gautam
- Division of Asthma Research Cincinnati Children's Hospital Medical Center Cincinnati OH USA
| | - Tesfaye B. Mersha
- Division of Asthma Research Cincinnati Children's Hospital Medical Center Cincinnati OH USA
- Department of Pediatrics University of Cincinnati Cincinnati, Cincinnati OH USA
| |
Collapse
|