1
|
Dutta D, Sen A, Satagopan JM. Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis-An application in renal clear cell carcinoma. Genet Epidemiol 2024; 48:414-432. [PMID: 38751238 PMCID: PMC11589067 DOI: 10.1002/gepi.22566] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 04/04/2024] [Accepted: 04/22/2024] [Indexed: 11/27/2024]
Abstract
Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing "gene component scores" and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA-KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of ASAH1 gene trans-regulated by methylation of several genes including SIX5 and by CNAs in the 10q25 region including TCF7L2. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia-regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.
Collapse
Affiliation(s)
- Diptavo Dutta
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology and GeneticsNational Cancer InstituteRockvilleUSA
| | - Ananda Sen
- Department of BiostatisticsUniversity of MichiganAnn ArborUSA
- Department of Family MedicineUniversity of MichiganAnn ArborUSA
| | - Jaya M. Satagopan
- Department of Biostatistics and EpidemiologyRutgers School of Public HealthPiscatawayUSA
| |
Collapse
|
2
|
Pham DT, Tran TD. Drivergene.net: A Cytoscape app for the identification of driver nodes of large-scale complex networks and case studies in discovery of drug target genes. Comput Biol Med 2024; 179:108888. [PMID: 39047507 DOI: 10.1016/j.compbiomed.2024.108888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 06/15/2024] [Accepted: 07/11/2024] [Indexed: 07/27/2024]
Abstract
There are no tools to identify driver nodes of large-scale networks in approach of competition-based controllability. This study proposed a novel method for this computation of large-scale networks. It implemented the method in a new Cytoscape plug-in app called Drivergene.net. Experiments of the software on large-scale biomolecular networks have shown outstanding speed and computing power. Interestingly, 86.67% of the top 10 driver nodes found on these networks are anticancer drug target genes that reside mostly at the innermost K-cores of the networks. Finally, compared method with those of five other researchers and confirmed that the proposed method outperforms the other methods on identification of anticancer drug target genes. Taken together, Drivergene.net is a reliable tool that efficiently detects not only drug target genes from biomolecular networks but also driver nodes of large-scale complex networks. Drivergene.net with a user manual and example datasets are available https://github.com/tinhpd/Drivergene.git.
Collapse
Affiliation(s)
- Duc-Tinh Pham
- Complex Systems and Bioinformatics Lab, Hanoi University of Industry, 298 Cau Dien Street, Bac Tu Liem District, Hanoi, Viet Nam; Graduate University of Science and Technology, Academy of Science and Technology Viet Nam, 18 Hoang Quoc Viet Street, Cau Giay District, Hanoi, Viet Nam
| | - Tien-Dzung Tran
- Complex Systems and Bioinformatics Lab, Hanoi University of Industry, 298 Cau Dien Street, Bac Tu Liem District, Hanoi, Viet Nam; Faculty of Information and Communication Technology, Hanoi University of Industry, 298 Cau Dien Street, Bac Tu Liem District, Hanoi, Viet Nam.
| |
Collapse
|
3
|
Sajid S, Mashkoor M, Jørgensen MG, Christensen LP, Hansen PR, Franzyk H, Mirza O, Prabhala BK. The Y-ome Conundrum: Insights into Uncharacterized Genes and Approaches for Functional Annotation. Mol Cell Biochem 2024; 479:1957-1968. [PMID: 37610616 DOI: 10.1007/s11010-023-04827-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 08/09/2023] [Indexed: 08/24/2023]
Abstract
The ever-increasing availability of genome sequencing data has revealed a substantial number of uncharacterized genes without known functions across various organisms. The first comprehensive genome sequencing of E. coli K12 revealed that more than 50% of its open reading frames corresponded to transcripts with no known functions. The group of protein-coding genes without a functional description and/or a recognized pathway, beginning with the letter "Y", is classified as the "y-ome". Several efforts have been made to elucidate the functions of these genes and to recognize their role in biological processes. This review provides a brief update on various strategies employed when studying the y-ome, such as high-throughput experimental approaches, comparative omics, metabolic engineering, gene expression analysis, and data integration techniques. Additionally, we highlight recent advancements in functional annotation methods, including the use of machine learning, network analysis, and functional genomics approaches. Novel approaches are required to produce more precise functional annotations across the genome to reduce the number of genes with unknown functions.
Collapse
Affiliation(s)
- Salvia Sajid
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100, Copenhagen Ø, Denmark
- Department of Physics, Chemistry, and Pharmacy, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark
| | - Maliha Mashkoor
- Department of Surgery, Center for Surgical Sciences, Zealand University Hospital, Lykkebækvej 1, 4600, Køge, Denmark
| | - Mikkel Girke Jørgensen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark
| | - Lars Porskjær Christensen
- Department of Physics, Chemistry, and Pharmacy, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark
| | - Paul Robert Hansen
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100, Copenhagen Ø, Denmark
| | - Henrik Franzyk
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100, Copenhagen Ø, Denmark
| | - Osman Mirza
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100, Copenhagen Ø, Denmark
| | - Bala Krishna Prabhala
- Department of Physics, Chemistry, and Pharmacy, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark.
| |
Collapse
|
4
|
Caniza H, Cáceres JJ, Torres M, Paccanaro A. LanDis: the disease landscape explorer. Eur J Hum Genet 2024; 32:461-465. [PMID: 38200084 PMCID: PMC10999415 DOI: 10.1038/s41431-023-01511-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 11/01/2023] [Accepted: 11/23/2023] [Indexed: 01/12/2024] Open
Abstract
From a network medicine perspective, a disease is the consequence of perturbations on the interactome. These perturbations tend to appear in a specific neighbourhood on the interactome, the disease module, and modules related to phenotypically similar diseases tend to be located in close-by regions. We present LanDis, a freely available web-based interactive tool ( https://paccanarolab.org/landis ) that allows domain experts, medical doctors and the larger scientific community to graphically navigate the interactome distances between the modules of over 44 million pairs of heritable diseases. The map-like interface provides detailed comparisons between pairs of diseases together with supporting evidence. Every disease in LanDis is linked to relevant entries in OMIM and UniProt, providing a starting point for in-depth analysis and an opportunity for novel insight into the aetiology of diseases as well as differential diagnosis.
Collapse
Affiliation(s)
- Horacio Caniza
- Universidad Paraguayo Alemana de Ciencias Aplicadas, Facultad de Ciencias de la Ingeniería, San Lorenzo, Paraguay
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK
| | - Juan J Cáceres
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK
| | - Mateo Torres
- Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil
| | - Alberto Paccanaro
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK.
- Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil.
| |
Collapse
|
5
|
Lucena-Padros H, Bravo-Gil N, Tous C, Rojano E, Seoane-Zonjic P, Fernández RM, Ranea JAG, Antiñolo G, Borrego S. Bioinformatics Prediction for Network-Based Integrative Multi-Omics Expression Data Analysis in Hirschsprung Disease. Biomolecules 2024; 14:164. [PMID: 38397401 PMCID: PMC10886964 DOI: 10.3390/biom14020164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 01/15/2024] [Accepted: 01/27/2024] [Indexed: 02/25/2024] Open
Abstract
Hirschsprung's disease (HSCR) is a rare developmental disorder in which enteric ganglia are missing along a portion of the intestine. HSCR has a complex inheritance, with RET as the major disease-causing gene. However, the pathogenesis of HSCR is still not completely understood. Therefore, we applied a computational approach based on multi-omics network characterization and clustering analysis for HSCR-related gene/miRNA identification and biomarker discovery. Protein-protein interaction (PPI) and miRNA-target interaction (MTI) networks were analyzed by DPClusO and BiClusO, respectively, and finally, the biomarker potential of miRNAs was computationally screened by miRNA-BD. In this study, a total of 55 significant gene-disease modules were identified, allowing us to propose 178 new HSCR candidate genes and two biological pathways. Moreover, we identified 12 key miRNAs with biomarker potential among 137 predicted HSCR-associated miRNAs. Functional analysis of new candidates showed that enrichment terms related to gene ontology (GO) and pathways were associated with HSCR. In conclusion, this approach has allowed us to decipher new clues of the etiopathogenesis of HSCR, although molecular experiments are further needed for clinical validations.
Collapse
Affiliation(s)
- Helena Lucena-Padros
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/CSIC/University of Seville, 41013 Seville, Spain
| | - Nereida Bravo-Gil
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/CSIC/University of Seville, 41013 Seville, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), 41013 Seville, Spain
| | - Cristina Tous
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/CSIC/University of Seville, 41013 Seville, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), 41013 Seville, Spain
| | - Elena Rojano
- Department of Molecular Biology and Biochemistry, University of Malaga, 29010 Malaga, Spain
- Biomedical Research Institute of Malaga, IBIMA, 29010 Malaga, Spain
| | - Pedro Seoane-Zonjic
- Department of Molecular Biology and Biochemistry, University of Malaga, 29010 Malaga, Spain
- Biomedical Research Institute of Malaga, IBIMA, 29010 Malaga, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), 29071 Malaga, Spain
| | - Raquel María Fernández
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/CSIC/University of Seville, 41013 Seville, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), 41013 Seville, Spain
| | - Juan A. G. Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, 29010 Malaga, Spain
- Biomedical Research Institute of Malaga, IBIMA, 29010 Malaga, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), 29071 Malaga, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), Instituto de Salud Carlos III (ISCIII), 28029 Madrid, Spain
| | - Guillermo Antiñolo
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/CSIC/University of Seville, 41013 Seville, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), 41013 Seville, Spain
| | - Salud Borrego
- Department of Maternofetal Medicine, Genetics and Reproduction, Institute of Biomedicine of Seville, University Hospital Virgen del Rocío/CSIC/University of Seville, 41013 Seville, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), 41013 Seville, Spain
| |
Collapse
|
6
|
Chatterjee S, Sanjeev BS. Over-representation analysis of angiogenic factors in immunosuppressive mechanisms in neoplasms and neurological conditions during COVID-19. Microb Pathog 2023; 185:106386. [PMID: 37865274 DOI: 10.1016/j.micpath.2023.106386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/27/2023] [Accepted: 10/09/2023] [Indexed: 10/23/2023]
Abstract
BACKGROUND Recent studies emphasized the necessity to identify key (human) biological processes and pathways targeted by the Coronaviridae family of viruses, especially Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Coronavirus Disease (COVID-19) caused up to 33-55 % death rates in COVID-19 patients with malignant neoplasms and Alzheimer's disease. Given this scenario, we identified biological processes and pathways involved in various diseases which are most likely affected by COVID-19. METHODS The COVID-19 DisGeNET data set (v4.0) contains the associations between various diseases and human genes known to interact with viruses from Coronaviridae family and were obtained from the IntAct Coronavirus data set annotated with DisGeNET data. We constructed the disease-gene network to identify genes that are involved in various comorbid diseased states. Communities from the disease-gene network were identified using Louvain method and functional enrichment through over-representation analysis methodology was used to discover significant biological processes and pathways shared between COVID-19 and other diseases. RESULT The COVID-19 DisGeNET data set (v4.0) comprised of 828 human genes and 10,473 diseases (including various phenotypes) that together constituted nodes in the disease-gene network. Each of the 70,210 edges connects a human gene with an associated disease. The top 10 genes linked to most number of diseases were VEGFA, BCL2, CTNNB1, ALB, COX2, AGT, HLA-A, HMOX1, FGF2 and COMT. The most vulnerable group of patients thus discovered had comorbid conditions such as carcinomas, malignant neoplasms and Alzheimer's disease. Finally, we identified 15 potentially useful biological processes and pathways for improved therapies. Vascular endothelial growth factor (VEGF) is the key mediator of angiogenesis in cancer. It is widely distributed in the brain and plays a crucial role in brain inflammation regulating the level of angiopoietins. With a degree of 1899, VEGFA was associated with maximum number of diseases in the disease-gene network. Previous studies have indicated that increased levels of VEGFA in the blood results in dyspnea, Pulmonary Edema (PE), Acute Lung Injury (ALI) and Acute Respiratory Distress Syndrome (ARDS). In case of COVID-19 patients with neoplasms and other neurological symptoms, our results indicate VEGFA as a therapeutic target for inflammation suppression. As VEGFs are known to disproportionately affect cancer patients, improving endothelial permeability and vasodilation with anti-VEGF therapy could lead to suppression of inflammation and also improve oxygenation. As an outcome of our study, we make case for clinical investigations towards anti-VEGF therapies for such comorbid conditions affected by COVID-19 for better therapeutic outcomes.
Collapse
Affiliation(s)
- S Chatterjee
- Department of Applied Sciences, Indian Institute of Information Technology, Allahabad, India.
| | - B S Sanjeev
- Department of Applied Sciences, Indian Institute of Information Technology, Allahabad, India.
| |
Collapse
|
7
|
Shi W, Feng H, Li J, Liu T, Liu Z. DapBCH: a disease association prediction model Based on Cross-species and Heterogeneous graph embedding. Front Genet 2023; 14:1222346. [PMID: 37811150 PMCID: PMC10556742 DOI: 10.3389/fgene.2023.1222346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 09/11/2023] [Indexed: 10/10/2023] Open
Abstract
The study of comorbidity can provide new insights into the pathogenesis of the disease and has important economic significance in the clinical evaluation of treatment difficulty, medical expenses, length of stay, and prognosis of the disease. In this paper, we propose a disease association prediction model DapBCH, which constructs a cross-species biological network and applies heterogeneous graph embedding to predict disease association. First, we combine the human disease-gene network, mouse gene-phenotype network, human-mouse homologous gene network, and human protein-protein interaction network to reconstruct a heterogeneous biological network. Second, we apply heterogeneous graph embedding based on meta-path aggregation to generate the feature vector of disease nodes. Finally, we employ link prediction to obtain the similarity of disease pairs. The experimental results indicate that our model is highly competitive in predicting the disease association and is promising for finding potential disease associations.
Collapse
Affiliation(s)
- Wanqi Shi
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Hailin Feng
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Jian Li
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Tongcun Liu
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Zhe Liu
- College of Media Engineering, Zhejiang University of Media and Communications, Hangzhou, Zhejiang, China
| |
Collapse
|
8
|
Wimalagunasekara SS, Weeraman JWJK, Tirimanne S, Fernando PC. Protein-protein interaction (PPI) network analysis reveals important hub proteins and sub-network modules for root development in rice (Oryza sativa). J Genet Eng Biotechnol 2023; 21:69. [PMID: 37246172 DOI: 10.1186/s43141-023-00515-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 05/06/2023] [Indexed: 05/30/2023]
Abstract
BACKGROUND The root system is vital to plant growth and survival. Therefore, genetic improvement of the root system is beneficial for developing stress-tolerant and improved plant varieties. This requires the identification of proteins that significantly contribute to root development. Analyzing protein-protein interaction (PPI) networks is vastly beneficial in studying developmental phenotypes, such as root development, because a phenotype is an outcome of several interacting proteins. PPI networks can be analyzed to identify modules and get a global understanding of important proteins governing the phenotypes. PPI network analysis for root development in rice has not been performed before and has the potential to yield new findings to improve stress tolerance. RESULTS Here, the network module for root development was extracted from the global Oryza sativa PPI network retrieved from the STRING database. Novel protein candidates were predicted, and hub proteins and sub-modules were identified from the extracted module. The validation of the predictions yielded 75 novel candidate proteins, 6 sub-modules, 20 intramodular hubs, and 2 intermodular hubs. CONCLUSIONS These results show how the PPI network module is organized for root development and can be used for future wet-lab studies for producing improved rice varieties.
Collapse
Affiliation(s)
| | - Janith W J K Weeraman
- Department of Plant Sciences, Faculty of Science, University of Colombo, Colombo, Sri Lanka.
| | - Shamala Tirimanne
- Department of Plant Sciences, Faculty of Science, University of Colombo, Colombo, Sri Lanka
| | - Pasan C Fernando
- Department of Plant Sciences, Faculty of Science, University of Colombo, Colombo, Sri Lanka
| |
Collapse
|
9
|
Hasan M, Kumar N, Majeed A, Ahmad A, Mukhtar S. Protein-Protein Interaction Network Analysis Using NetworkX. Methods Mol Biol 2023; 2690:457-467. [PMID: 37450166 DOI: 10.1007/978-1-0716-3327-4_35] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2023]
Abstract
In recent years, extracting information from biological data has become a particularly valuable way of gaining knowledge. Molecular interaction networks provide a framework for visualizing cellular processes, but their complexity frequently makes their interpretation difficult. Proteins are one of the primary determinants of biological function. Indeed, most biological activities in the living cells are functionally regulated by protein-protein interactions (PPIs). Thus, studying protein interactions is critical for understanding their roles within the cell. Exploring the PPI networks can open new avenues for future experimental studies and offer interspecies predictions for effective interaction mapping. In this chapter we will demonstrate how to construct, visualize, and analyze a protein-protein interaction network using NetworkX.
Collapse
Affiliation(s)
- Mehadi Hasan
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Nilesh Kumar
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Aqsa Majeed
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Aftab Ahmad
- Department of Anesthesiology and Perioperative Medicine, Birmingham, AL, USA
| | - Shahid Mukhtar
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA.
| |
Collapse
|
10
|
Sa ZY, Xu JS, Pan XH, Zheng SX, Huang QR, Wan L, Zhu XX, Lan CL, Ye XR. Effects of electroacupuncture on rats with cognitive impairment: An iTRAQ-based proteomics analysis. JOURNAL OF INTEGRATIVE MEDICINE 2023; 21:89-98. [PMID: 36424268 DOI: 10.1016/j.joim.2022.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 09/06/2022] [Indexed: 11/11/2022]
Abstract
OBJECTIVE The study explores the effects of electroacupuncture (EA) at the governing vessel (GV) on proteomic changes in the hippocampus of rats with cognitive impairment. METHODS Healthy male rats were randomly divided into 3 groups: sham, model and EA. Cognitive impairment was induced by left middle cerebral artery occlusion in the model and EA groups. Rats in the EA group were treated with EA at Shenting (GV24) and Baihui (GV20) for 7 d. Neurological deficit was scored using the Longa scale, the learning and memory ability was detected using the Morris water maze (MWM) test, and the proteomic profiling in the hippocampus was analyzed using protein-labeling technology based on the isobaric tag for relative and absolute quantitation (iTRAQ). The Western blot (WB) analysis was used to detect the proteins and validate the results of iTRAQ. RESULTS Compared with the model group, the neurological deficit score was significantly reduced, and the escape latency in the MWM test was significantly shortened, while the number of platform crossings increased in the EA group. A total of 2872 proteins were identified by iTRAQ. Differentially expressed proteins (DEPs) were identified between different groups: 92 proteins were upregulated and 103 were downregulated in the model group compared with the sham group, while 142 proteins were upregulated and 126 were downregulated in the EA group compared with the model group. Most of the DEPs were involved in oxidative phosphorylation, glycolipid metabolism and synaptic transmission. Furthermore, we also verified 4 DEPs using WB technology. Although the WB results were not exactly the same as the iTRAQ results, the expression trends of the DEPs were consistent. The upregulation of heat-shock protein β1 (Hspb1) was the highest in the EA group compared to the model group. CONCLUSION EA can effect proteomic changes in the hippocampus of rats with cognitive impairment. Hspb1 may be involved in the molecular mechanism by which acupuncture improves cognitive impairment.
Collapse
Affiliation(s)
- Zhe-Yan Sa
- Department of Meridian Research, Fujian Academy of Chinese Medical Sciences, Fuzhou 350003, Fujian Province, China; Key Laboratory of Propagated Sensation along Meridian of Fujian Province, Fuzhou 350003, Fujian Province, China
| | - Jin-Sen Xu
- Department of Meridian Research, Fujian Academy of Chinese Medical Sciences, Fuzhou 350003, Fujian Province, China; Key Laboratory of Propagated Sensation along Meridian of Fujian Province, Fuzhou 350003, Fujian Province, China.
| | - Xiao-Hua Pan
- Department of Meridian Research, Fujian Academy of Chinese Medical Sciences, Fuzhou 350003, Fujian Province, China; Key Laboratory of Propagated Sensation along Meridian of Fujian Province, Fuzhou 350003, Fujian Province, China.
| | - Shu-Xia Zheng
- Department of Meridian Research, Fujian Academy of Chinese Medical Sciences, Fuzhou 350003, Fujian Province, China; Key Laboratory of Propagated Sensation along Meridian of Fujian Province, Fuzhou 350003, Fujian Province, China
| | - Qian-Ru Huang
- Department of Meridian Research, Fujian Academy of Chinese Medical Sciences, Fuzhou 350003, Fujian Province, China; Key Laboratory of Propagated Sensation along Meridian of Fujian Province, Fuzhou 350003, Fujian Province, China
| | - Long Wan
- Department of Meridian Research, Fujian Academy of Chinese Medical Sciences, Fuzhou 350003, Fujian Province, China; Key Laboratory of Propagated Sensation along Meridian of Fujian Province, Fuzhou 350003, Fujian Province, China
| | - Xiao-Xiang Zhu
- Department of Meridian Research, Fujian Academy of Chinese Medical Sciences, Fuzhou 350003, Fujian Province, China; Key Laboratory of Propagated Sensation along Meridian of Fujian Province, Fuzhou 350003, Fujian Province, China
| | - Cai-Lian Lan
- Department of Meridian Research, Fujian Academy of Chinese Medical Sciences, Fuzhou 350003, Fujian Province, China; Key Laboratory of Propagated Sensation along Meridian of Fujian Province, Fuzhou 350003, Fujian Province, China
| | - Xiao-Ran Ye
- Department of Meridian Research, Fujian Academy of Chinese Medical Sciences, Fuzhou 350003, Fujian Province, China; Key Laboratory of Propagated Sensation along Meridian of Fujian Province, Fuzhou 350003, Fujian Province, China
| |
Collapse
|
11
|
Dutta D, Sen A, Satagopan J. Sparse canonical correlation to identify breast cancer related genes regulated by copy number aberrations. PLoS One 2022; 17:e0276886. [PMID: 36584096 PMCID: PMC9803132 DOI: 10.1371/journal.pone.0276886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 10/16/2022] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Copy number aberrations (CNAs) in cancer affect disease outcomes by regulating molecular phenotypes, such as gene expressions, that drive important biological processes. To gain comprehensive insights into molecular biomarkers for cancer, it is critical to identify key groups of CNAs, the associated gene modules, regulatory modules, and their downstream effect on outcomes. METHODS In this paper, we demonstrate an innovative use of sparse canonical correlation analysis (sCCA) to effectively identify the ensemble of CNAs, and gene modules in the context of binary and censored disease endpoints. Our approach detects potentially orthogonal gene expression modules which are highly correlated with sets of CNA and then identifies the genes within these modules that are associated with the outcome. RESULTS Analyzing clinical and genomic data on 1,904 breast cancer patients from the METABRIC study, we found 14 gene modules to be regulated by groups of proximally located CNA sites. We validated this finding using an independent set of 1,077 breast invasive carcinoma samples from The Cancer Genome Atlas (TCGA). Our analysis of 7 clinical endpoints identified several novel and interpretable regulatory associations, highlighting the role of CNAs in key biological pathways and processes for breast cancer. Genes significantly associated with the outcomes were enriched for early estrogen response pathway, DNA repair pathways as well as targets of transcription factors such as E2F4, MYC, and ETS1 that have recognized roles in tumor characteristics and survival. Subsequent meta-analysis across the endpoints further identified several genes through the aggregation of weaker associations. CONCLUSIONS Our findings suggest that sCCA analysis can aggregate weaker associations to identify interpretable and important genes, modules, and clinically consequential pathways.
Collapse
Affiliation(s)
- Diptavo Dutta
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, United States of America
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
- * E-mail: ,
| | - Ananda Sen
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States of America
| | - Jaya Satagopan
- Department of Biostatistics and Epidemiology, Rutgers University, New Brunswick, NJ, United States of America
| |
Collapse
|
12
|
Qumsiyeh E, Showe L, Yousef M. GediNET for discovering gene associations across diseases using knowledge based machine learning approach. Sci Rep 2022; 12:19955. [PMID: 36402891 PMCID: PMC9675776 DOI: 10.1038/s41598-022-24421-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/15/2022] [Indexed: 11/21/2022] Open
Abstract
The most common approaches to discovering genes associated with specific diseases are based on machine learning and use a variety of feature selection techniques to identify significant genes that can serve as biomarkers for a given disease. More recently, the integration in this process of prior knowledge-based approaches has shown significant promise in the discovery of new biomarkers with potential translational applications. In this study, we developed a novel approach, GediNET, that integrates prior biological knowledge to gene Groups that are shown to be associated with a specific disease such as a cancer. The novelty of GediNET is that it then also allows the discovery of significant associations between that specific disease and other diseases. The initial step in this process involves the identification of gene Groups. The Groups are then subjected to a Scoring component to identify the top performing classification Groups. The top-ranked gene Groups are then used to train a Machine Learning Model. The process of Grouping, Scoring and Modelling (G-S-M) is used by GediNET to identify other diseases that are similarly associated with this signature. GediNET identifies these relationships through Disease-Disease Association (DDA) based machine learning. DDA explores novel associations between diseases and identifies relationships which could be used to further improve approaches to diagnosis, prognosis, and treatment. The GediNET KNIME workflow can be downloaded from: https://github.com/malikyousef/GediNET.git or https://kni.me/w/3kH1SQV_mMUsMTS .
Collapse
Affiliation(s)
- Emma Qumsiyeh
- Information Technology Engineering, Al-Quds University, Abu Dis, Palestine.
| | - Louise Showe
- The Wistar Institute, Philadelphia, PA, 19104, USA
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, 13206, Zefat, Israel.
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel.
| |
Collapse
|
13
|
Chen Y, Ma Y, Zhai Y, Yang H, Zhang C, Lu Y, Wei W, Cai Q, Ding X, Lu S, Fang Z. Persistent dysregulation of genes in the development of endometriosis. ANNALS OF TRANSLATIONAL MEDICINE 2022; 10:1175. [PMID: 36467354 PMCID: PMC9708481 DOI: 10.21037/atm-22-4806] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 11/07/2022] [Indexed: 09/29/2023]
Abstract
BACKGROUND Endometriosis is a chronic condition that affects women of child-bearing age. Since the etiology and pathogenesis of endometriosis have not been fully elucidated, it is important to investigate the mechanisms that lead to the deterioration of endometriosis. METHODS In this study, the transcriptome data of patients with normal, mild, and severe endometriosis were examined using the GSE51981 dataset obtained from the Gene Expression Omnibus database. Short Time Series Expression Miner (STEM) was used to screen the genes with continuous expression disorder in the development process, and the core genes were identified by constructing a protein-protein interaction network. The molecular mechanisms of endometriosis were examined using enrichment analysis. Finally, the transcription factors that regulate the core genes were predicted and the comprehensive mechanisms involved in the development of endometriosis were examined. RESULTS A total of 3,472 differentially expressed genes were identified from the normal, mild, and severe endometriosis samples. These were allocated into 12 modules and HRAS, HSP90AA1, TGFB1, TP53, and UBC were selected as the core genes. Enrichment analysis showed that the genes in modules 6, 7, and 9 were significantly related to oxygen levels, metallic processes, and hormone levels, respectively. Transcription factor prediction analysis showed that TP53 regulates HRAS to participate in immune related signaling pathways. Drug prediction analysis identified 792 drugs that interact with the targeted core genes. CONCLUSIONS This study explored the molecular mechanisms involved in the development of endometriosis and identified potential biomarkers of endometriosis. This data may provide novel targets and research directions for the diagnosis and treatment of endometriosis.
Collapse
Affiliation(s)
- Yanli Chen
- Department of Obstetrics and Gynecology, The Fourth Affiliated Hospital of Guangxi Medical University, Liuzhou, China
| | - Yanqun Ma
- Department of Obstetrics and Gynecology, The Fourth Affiliated Hospital of Guangxi Medical University, Liuzhou, China
| | - Yanzhi Zhai
- Department of Obstetrics and Gynecology, The Fourth Affiliated Hospital of Guangxi Medical University, Liuzhou, China
| | - Haiyan Yang
- Department of Obstetrics and Gynecology, The Fourth Affiliated Hospital of Guangxi Medical University, Liuzhou, China
| | - Chunlan Zhang
- Department of Obstetrics and Gynecology, The Fourth Affiliated Hospital of Guangxi Medical University, Liuzhou, China
| | - Yingxin Lu
- Department of Obstetrics and Gynecology, The Fourth Affiliated Hospital of Guangxi Medical University, Liuzhou, China
| | - Wei Wei
- Department of Obstetrics and Gynecology, The Fourth Affiliated Hospital of Guangxi Medical University, Liuzhou, China
| | - Qing Cai
- Department of Obstetrics and Gynecology, The Fourth Affiliated Hospital of Guangxi Medical University, Liuzhou, China
| | - Xuewen Ding
- Department of Obstetrics and Gynecology, The Fourth Affiliated Hospital of Guangxi Medical University, Liuzhou, China
| | - Shan Lu
- Department of Obstetrics and Gynecology, The Fourth Affiliated Hospital of Guangxi Medical University, Liuzhou, China
| | - Ziyu Fang
- Department of Obstetrics and Gynecology, The Fourth Affiliated Hospital of Guangxi Medical University, Liuzhou, China
| |
Collapse
|
14
|
IoMT-Based Mitochondrial and Multifactorial Genetic Inheritance Disorder Prediction Using Machine Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2650742. [PMID: 35909844 PMCID: PMC9334098 DOI: 10.1155/2022/2650742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Accepted: 07/04/2022] [Indexed: 11/18/2022]
Abstract
A genetic disorder is a serious disease that affects a large number of individuals around the world. There are various types of genetic illnesses, however, we focus on mitochondrial and multifactorial genetic disorders for prediction. Genetic illness is caused by a number of factors, including a defective maternal or paternal gene, excessive abortions, a lack of blood cells, and low white blood cell count. For premature or teenage life development, early detection of genetic diseases is crucial. Although it is difficult to forecast genetic disorders ahead of time, this prediction is very critical since a person's life progress depends on it. Machine learning algorithms are used to diagnose genetic disorders with high accuracy utilizing datasets collected and constructed from a large number of patient medical reports. A lot of studies have been conducted recently employing genome sequencing for illness detection, but fewer studies have been presented using patient medical history. The accuracy of existing studies that use a patient's history is restricted. The internet of medical things (IoMT) based proposed model for genetic disease prediction in this article uses two separate machine learning algorithms: support vector machine (SVM) and K-Nearest Neighbor (KNN). Experimental results show that SVM has outperformed the KNN and existing prediction methods in terms of accuracy. SVM achieved an accuracy of 94.99% and 86.6% for training and testing, respectively.
Collapse
|
15
|
Network-Based Approaches for Disease-Gene Association Prediction Using Protein-Protein Interaction Networks. Int J Mol Sci 2022; 23:ijms23137411. [PMID: 35806415 PMCID: PMC9266751 DOI: 10.3390/ijms23137411] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/25/2022] [Accepted: 06/30/2022] [Indexed: 01/02/2023] Open
Abstract
Genome-wide association studies (GWAS) can be used to infer genome intervals that are involved in genetic diseases. However, investigating a large number of putative mutations for GWAS is resource- and time-intensive. Network-based computational approaches are being used for efficient disease-gene association prediction. Network-based methods are based on the underlying assumption that the genes causing the same diseases are located close to each other in a molecular network, such as a protein-protein interaction (PPI) network. In this survey, we provide an overview of network-based disease-gene association prediction methods based on three categories: graph-theoretic algorithms, machine learning algorithms, and an integration of these two. We experimented with six selected methods to compare their prediction performance using a heterogeneous network constructed by combining a genome-wide weighted PPI network, an ontology-based disease network, and disease-gene associations. The experiment was conducted in two different settings according to the presence and absence of known disease-associated genes. The results revealed that HerGePred, an integrative method, outperformed in the presence of known disease-associated genes, whereas PRINCE, which adopted a network propagation algorithm, was the most competitive in the absence of known disease-associated genes. Overall, the results demonstrated that the integrative methods performed better than the methods using graph-theory only, and the methods using a heterogeneous network performed better than those using a homogeneous PPI network only.
Collapse
|
16
|
Network-Based Methods for Approaching Human Pathologies from a Phenotypic Point of View. Genes (Basel) 2022; 13:genes13061081. [PMID: 35741843 PMCID: PMC9222217 DOI: 10.3390/genes13061081] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/10/2022] [Accepted: 06/14/2022] [Indexed: 01/27/2023] Open
Abstract
Network and systemic approaches to studying human pathologies are helping us to gain insight into the molecular mechanisms of and potential therapeutic interventions for human diseases, especially for complex diseases where large numbers of genes are involved. The complex human pathological landscape is traditionally partitioned into discrete “diseases”; however, that partition is sometimes problematic, as diseases are highly heterogeneous and can differ greatly from one patient to another. Moreover, for many pathological states, the set of symptoms (phenotypes) manifested by the patient is not enough to diagnose a particular disease. On the contrary, phenotypes, by definition, are directly observable and can be closer to the molecular basis of the pathology. These clinical phenotypes are also important for personalised medicine, as they can help stratify patients and design personalised interventions. For these reasons, network and systemic approaches to pathologies are gradually incorporating phenotypic information. This review covers the current landscape of phenotype-centred network approaches to study different aspects of human diseases.
Collapse
|
17
|
Mancuso CA, Bills PS, Krum D, Newsted J, Liu R, Krishnan A. GenePlexus: a web-server for gene discovery using network-based machine learning. Nucleic Acids Res 2022; 50:W358-W366. [PMID: 35580053 PMCID: PMC9252732 DOI: 10.1093/nar/gkac335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/13/2022] [Accepted: 04/30/2022] [Indexed: 11/28/2022] Open
Abstract
Biomedical researchers take advantage of high-throughput, high-coverage technologies to routinely generate sets of genes of interest across a wide range of biological conditions. Although these technologies have directly shed light on the molecular underpinnings of various biological processes and diseases, the list of genes from any individual experiment is often noisy and incomplete. Additionally, interpreting these lists of genes can be challenging in terms of how they are related to each other and to other genes in the genome. In this work, we present GenePlexus (https://www.geneplexus.net/), a web-server that allows a researcher to utilize a powerful, network-based machine learning method to gain insights into their gene set of interest and additional functionally similar genes. Once a user uploads their own set of human genes and chooses between a number of different human network representations, GenePlexus provides predictions of how associated every gene in the network is to the input set. The web-server also provides interpretability through network visualization and comparison to other machine learning models trained on thousands of known process/pathway and disease gene sets. GenePlexus is free and open to all users without the need for registration.
Collapse
Affiliation(s)
- Christopher A Mancuso
- Department Of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Patrick S Bills
- Data Management and Analytics, IT Services, Michigan State University, East Lansing, MI 48824, USA
| | - Douglas Krum
- Data Management and Analytics, IT Services, Michigan State University, East Lansing, MI 48824, USA
| | - Jacob Newsted
- Data Management and Analytics, IT Services, Michigan State University, East Lansing, MI 48824, USA
| | - Renming Liu
- Department Of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Arjun Krishnan
- Department Of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.,Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
18
|
Noor F, Tahir ul Qamar M, Ashfaq UA, Albutti A, Alwashmi ASS, Aljasir MA. Network Pharmacology Approach for Medicinal Plants: Review and Assessment. Pharmaceuticals (Basel) 2022; 15:572. [PMID: 35631398 PMCID: PMC9143318 DOI: 10.3390/ph15050572] [Citation(s) in RCA: 156] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 04/27/2022] [Accepted: 04/27/2022] [Indexed: 12/13/2022] Open
Abstract
Natural products have played a critical role in medicine due to their ability to bind and modulate cellular targets involved in disease. Medicinal plants hold a variety of bioactive scaffolds for the treatment of multiple disorders. The less adverse effects, affordability, and easy accessibility highlight their potential in traditional remedies. Identifying pharmacological targets from active ingredients of medicinal plants has become a hot topic for biomedical research to generate innovative therapies. By developing an unprecedented opportunity for the systematic investigation of traditional medicines, network pharmacology is evolving as a systematic paradigm and becoming a frontier research field of drug discovery and development. The advancement of network pharmacology has opened up new avenues for understanding the complex bioactive components found in various medicinal plants. This study is attributed to a comprehensive summary of network pharmacology based on current research, highlighting various active ingredients, related techniques/tools/databases, and drug discovery and development applications. Moreover, this study would serve as a protocol for discovering novel compounds to explore the full range of biological potential of traditionally used plants. We have attempted to cover this vast topic in the review form. We hope it will serve as a significant pioneer for researchers working with medicinal plants by employing network pharmacology approaches.
Collapse
Affiliation(s)
- Fatima Noor
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad 38000, Pakistan; (F.N.); (M.T.u.Q.)
| | - Muhammad Tahir ul Qamar
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad 38000, Pakistan; (F.N.); (M.T.u.Q.)
| | - Usman Ali Ashfaq
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad 38000, Pakistan; (F.N.); (M.T.u.Q.)
| | - Aqel Albutti
- Department of Medical Biotechnology, College of Applied Medical Sciences, Qassim University, Buraydah 51452, Saudi Arabia
| | - Ameen S. S. Alwashmi
- Department of Medical Laboratories, College of Applied Medical Sciences, Qassim University, Buraydah 51452, Saudi Arabia; (A.S.S.A.); (M.A.A.)
| | - Mohammad Abdullah Aljasir
- Department of Medical Laboratories, College of Applied Medical Sciences, Qassim University, Buraydah 51452, Saudi Arabia; (A.S.S.A.); (M.A.A.)
| |
Collapse
|
19
|
Koçoğlu C, Van Broeckhoven C, van der Zee J. How network-based approaches can complement gene identification studies in frontotemporal dementia. Trends Genet 2022; 38:944-955. [DOI: 10.1016/j.tig.2022.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Revised: 05/04/2022] [Accepted: 05/04/2022] [Indexed: 11/17/2022]
|
20
|
Functional stratification of cancer drugs through integrated network similarity. NPJ Syst Biol Appl 2022; 8:11. [PMID: 35440787 PMCID: PMC9018743 DOI: 10.1038/s41540-022-00219-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 01/21/2022] [Indexed: 11/30/2022] Open
Abstract
Drugs not only perturb their immediate protein targets but also modulate multiple signaling pathways. In this study, we explored networks modulated by several drugs across multiple cancer cell lines by integrating their targets with transcriptomic and phosphoproteomic data. As a result, we obtained 236 reconstructed networks covering five cell lines and 70 drugs. A rigorous topological and pathway analysis showed that chemically and functionally different drugs may modulate overlapping networks. Additionally, we revealed a set of tumor-specific hidden pathways with the help of drug network models that are not detectable from the initial data. The difference in the target selectivity of the drugs leads to disjoint networks despite sharing a similar mechanism of action, e.g., HDAC inhibitors. We also used the reconstructed network models to study potential drug combinations based on the topological separation and found literature evidence for a set of drug pairs. Overall, network-level exploration of drug-modulated pathways and their deep comparison may potentially help optimize treatment strategies and suggest new drug combinations.
Collapse
|
21
|
Koçoğlu C, Ferrari R, Roes M, Vandeweyer G, Kooy RF, van Broeckhoven C, Manzoni C, van der Zee J. Protein interaction network analysis reveals genetic enrichment of immune system genes in frontotemporal dementia. Neurobiol Aging 2022; 116:67-79. [DOI: 10.1016/j.neurobiolaging.2022.03.018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 03/09/2022] [Accepted: 03/31/2022] [Indexed: 12/12/2022]
|
22
|
Xiang J, Meng X, Zhao Y, Wu FX, Li M. HyMM: hybrid method for disease-gene prediction by integrating multiscale module structure. Brief Bioinform 2022; 23:6547263. [PMID: 35275996 DOI: 10.1093/bib/bbac072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/18/2022] [Accepted: 02/13/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.
Collapse
Affiliation(s)
- Ju Xiang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China; Department of Basic Medical Sciences & Academician Workstation, Changsha Medical University, Changsha, Hunan 410219, China
| | - Xiangmao Meng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
23
|
Ahmed MM, Tazyeen S, Haque S, Alsulimani A, Ali R, Sajad M, Alam A, Ali S, Bagabir HA, Bagabir RA, Ishrat R. Network-Based Approach and IVI Methodologies, a Combined Data Investigation Identified Probable Key Genes in Cardiovascular Disease and Chronic Kidney Disease. Front Cardiovasc Med 2022; 8:755321. [PMID: 35071341 PMCID: PMC8767007 DOI: 10.3389/fcvm.2021.755321] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Accepted: 11/17/2021] [Indexed: 01/28/2023] Open
Abstract
In fact, the risk of dying from CVD is significant when compared to the risk of developing end-stage renal disease (ESRD). Moreover, patients with severe CKD are often excluded from randomized controlled trials, making evidence-based therapy of comorbidities like CVD complicated. Thus, the goal of this study was to use an integrated bioinformatics approach to not only uncover Differentially Expressed Genes (DEGs), their associated functions, and pathways but also give a glimpse of how these two conditions are related at the molecular level. We started with GEO2R/R program (version 3.6.3, 64 bit) to get DEGs by comparing gene expression microarray data from CVD and CKD. Thereafter, the online STRING version 11.1 program was used to look for any correlations between all these common and/or overlapping DEGs, and the results were visualized using Cytoscape (version 3.8.0). Further, we used MCODE, a cytoscape plugin, and identified a total of 15 modules/clusters of the primary network. Interestingly, 10 of these modules contained our genes of interest (key genes). Out of these 10 modules that consist of 19 key genes (11 downregulated and 8 up-regulated), Module 1 (RPL13, RPLP0, RPS24, and RPS2) and module 5 (MYC, COX7B, and SOCS3) had the highest number of these genes. Then we used ClueGO to add a layer of GO terms with pathways to get a functionally ordered network. Finally, to identify the most influential nodes, we employed a novel technique called Integrated Value of Influence (IVI) by combining the network's most critical topological attributes. This method suggests that the nodes with many connections (calculated by hubness score) and high spreading potential (the spreader nodes are intended to have the most impact on the information flow in the network) are the most influential or essential nodes in a network. Thus, based on IVI values, hubness score, and spreading score, top 20 nodes were extracted, in which RPS27A non-seed gene and RPS2, a seed gene, came out to be the important node in the network.
Collapse
Affiliation(s)
- Mohd Murshad Ahmed
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, India
| | - Safia Tazyeen
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, India
| | - Shafiul Haque
- Research and Scientific Unit, College of Nursing and Allied Health Science, Jazan University, Jazan, Saudi Arabia
| | - Ahmad Alsulimani
- Department of Medical Laboratory Technology, College of Applied Medical Sciences, Jazan University, Jazan, Saudi Arbia
| | - Rafat Ali
- Department of Bioscience, Jamia Millia Islamia, New Delhi, India
| | - Mohd Sajad
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, India
| | - Aftab Alam
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, India
| | - Shahnawaz Ali
- Centre for Stem Cell & Regenerative Medicine, KING' College London, Guy's Hospital, London, United Kingdom
| | - Hala Abubaker Bagabir
- Department of Medical Physiology, Faculty of Medicine, King Abdulaziz University, Rabigh, Saudi Arabia
| | - Rania Abubaker Bagabir
- Department of Hematology and Immunology, College of Medicine, Umm-Al-Qura University, Mecca, Saudi Arabia
| | - Romana Ishrat
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, India,*Correspondence: Romana Ishrat ; orcid.org/0000-0001-9744-9047
| |
Collapse
|
24
|
Li Q, Li Y, Sun X, Zhang X, Zhang M. Genomic Analysis of Abnormal DNAM Methylation in Parathyroid Tumors. Int J Endocrinol 2022; 2022:4995196. [PMID: 35879975 PMCID: PMC9308548 DOI: 10.1155/2022/4995196] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 05/20/2022] [Accepted: 06/17/2022] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Parathyroid tumors are common endocrine neoplasias associated with primary hyperparathyroidism. Although numerous studies have studied the subject, the predictive value of gene biomarkers nevertheless remains low. METHODS In this study, we performed genomic analysis of abnormal DNA methylation in parathyroid tumors. After data preprocessing, differentially methylated genes were extracted from patients with parathyroid tumors by using t-tests. RESULTS After refinement of the basic differential methylation, 28241 unique CpGs (634 genes) were identified to be methylated. The methylated genes were primarily involved in 7 GO terms, and the top 3 terms were associated with cyst morphogenesis, ion transport, and GTPase signal. Following pathway enrichment analyses, a total of 10 significant pathways were enriched; notably, the top 3 pathways were cholinergic synapses, glutamatergic synapses, and oxytocin signaling pathways. Based on PPIN and ego-net analysis, 67 ego genes were found which could completely separate the diseased group from the normal group. The 10 most prominent genes included POLA1, FAM155 B, AMMECR1, THOC2, CCND1, CLDN11, IDS, TST, RBPJ, and GNA11. SVM analysis confirmed that this grouping approach was precise. CONCLUSIONS This research provides useful data to further explore novel genes and pathways as therapeutic targets for parathyroid tumors.
Collapse
Affiliation(s)
- Qing Li
- Department of General Surgery, The First Affiliated Hospital of Shandong First Medical University &Shandong Provincial Qianfoshan Hospital, No 16766 Jingshi Road, Jinan, Shandong, China
| | - Yonghao Li
- Department of General Surgery, The First Affiliated Hospital of Shandong First Medical University &Shandong Provincial Qianfoshan Hospital, No 16766 Jingshi Road, Jinan, Shandong, China
| | - Ximei Sun
- Department of General Surgery, The First Affiliated Hospital of Shandong First Medical University &Shandong Provincial Qianfoshan Hospital, No 16766 Jingshi Road, Jinan, Shandong, China
| | - Xinlei Zhang
- Department of General Surgery, The First Affiliated Hospital of Shandong First Medical University &Shandong Provincial Qianfoshan Hospital, No 16766 Jingshi Road, Jinan, Shandong, China
| | - Mei Zhang
- Department of General Surgery, The First Affiliated Hospital of Shandong First Medical University &Shandong Provincial Qianfoshan Hospital, No 16766 Jingshi Road, Jinan, Shandong, China
| |
Collapse
|
25
|
Meng X, Li W, Peng X, Li Y, Li M. Protein interaction networks: centrality, modularity, dynamics, and applications. FRONTIERS OF COMPUTER SCIENCE 2021; 15:156902. [DOI: 10.1007/s11704-020-8179-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 08/12/2020] [Indexed: 01/03/2025]
|
26
|
Liu J, Zhu H, Qiu J. Locally Adjust Networks Based on Connectivity and Semantic Similarities for Disease Module Detection. Front Genet 2021; 12:726596. [PMID: 34759955 PMCID: PMC8575408 DOI: 10.3389/fgene.2021.726596] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 09/22/2021] [Indexed: 11/13/2022] Open
Abstract
For studying the pathogenesis of complex diseases, it is important to identify the disease modules in the system level. Since the protein-protein interaction (PPI) networks contain a number of incomplete and incorrect interactome, most existing methods often lead to many disease proteins isolating from disease modules. In this paper, we propose an effective disease module identification method IDMCSS, where the used human PPI networks are obtained by adding some potential missing interactions from existing PPI networks, as well as removing some potential incorrect interactions. In IDMCSS, a network adjustment strategy is developed to add or remove links around disease proteins based on both topological and semantic information. Next, neighboring proteins of disease proteins are prioritized according to a suggested similarity between each of them and disease proteins, and the protein with the largest similarity with disease proteins is added into a candidate disease protein set one by one. The stopping criterion is set to the boundary of the disease proteins. Finally, the connected subnetwork having the largest number of disease proteins is selected as a disease module. Experimental results on asthma demonstrate the effectiveness of the method in comparison to existing algorithms for disease module identification. It is also shown that the proposed IDMCSS can obtain the disease modules having crucial biological processes of asthma and 12 targets for drug intervention can be predicted.
Collapse
Affiliation(s)
- Jia Liu
- State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing, China
| | - Huole Zhu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, Hefei, China
| | - Jianfeng Qiu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, Hefei, China
| |
Collapse
|
27
|
Yaoxing H, Danchun Y, Xiaojuan S, Shuman J, Qingqing Y, Lin J. Identification of Novel Susceptible Genes of Gastric Cancer Based on Integrated Omics Data. Front Cell Dev Biol 2021; 9:712020. [PMID: 34354996 PMCID: PMC8329722 DOI: 10.3389/fcell.2021.712020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 06/23/2021] [Indexed: 12/24/2022] Open
Abstract
Gastric cancer (GC) is one of the most common causes of cancer-related deaths in the world. This cancer has been regarded as a biological and genetically heterogeneous disease with a poorly understood carcinogenesis at the molecular level. Thousands of biomarkers and susceptible loci have been explored via experimental and computational methods, but their effects on disease outcome are still unknown. Genome-wide association studies (GWAS) have identified multiple susceptible loci for GC, but due to the linkage disequilibrium (LD), single-nucleotide polymorphisms (SNPs) may fall within the non-coding region and exert their biological function by modulating the gene expression level. In this study, we collected 1,091 cases and 410,350 controls from the GWAS catalog database. Integrating with gene expression level data obtained from stomach tissue, we conducted a machine learning-based method to predict GC-susceptible genes. As a result, we identified 787 novel susceptible genes related to GC, which will provide new insight into the genetic and biological basis for the mechanism and pathology of GC development.
Collapse
Affiliation(s)
- Huang Yaoxing
- Department of Gastroenterology, Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, China
| | - Yu Danchun
- Department of Gastroenterology, Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, China
| | - Sun Xiaojuan
- Department of Gastroenterology, Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, China
| | - Jiang Shuman
- Department of Gastroenterology, Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, China
| | - Yan Qingqing
- Department of Gastroenterology, Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, China
| | - Jia Lin
- Department of Gastroenterology, Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, China
| |
Collapse
|
28
|
Park Y, Heider D, Hauschild AC. Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence. Cancers (Basel) 2021; 13:3148. [PMID: 34202427 PMCID: PMC8269018 DOI: 10.3390/cancers13133148] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/16/2021] [Accepted: 06/21/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Anne-Christin Hauschild
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
- Department of Medical Informatics, University Medical Center Göttingen, 37075 Göttingen, Germany
| |
Collapse
|
29
|
Tolani P, Gupta S, Yadav K, Aggarwal S, Yadav AK. Big data, integrative omics and network biology. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2021; 127:127-160. [PMID: 34340766 DOI: 10.1016/bs.apcsb.2021.03.006] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
A cell integrates various signals through a network of biomolecules that crosstalk to synergistically regulate the replication, transcription, translation and other metabolic activities of a cell. These networks regulate signal perception and processing that drives biological functions. The biological complexity cannot be fully captured by a single -omics discipline. The holistic study of an organism-in health, perturbation, exposure to environment and disease, is studied under systems biology. The bottom-up molecular approaches (genes, mRNA, protein, metabolite, etc.) have laid the foundation of current biological knowledge covering the horizon from viruses, bacteria, fungi, plants and animals. Yet, these techniques provide a rather myopic view of biology at the molecular level. To understand how the interconnected molecular components are formed and rewired in disease or exposure to environmental stimuli is the holy grail of modern biology. The omics era was heralded by the genomics revolution but advanced sequencing techniques are now also ubiquitous in transcriptomics, proteomics, metabolomics and lipidomics. Multi-omics data analysis and integration techniques are driving the quest for deeper insights into how the different layers of biomolecules talk to each other in diverse contexts.
Collapse
Affiliation(s)
- Priya Tolani
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India
| | - Srishti Gupta
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India
| | - Kirti Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; Department of Pharmaceutical Biotechnology, Delhi Pharmaceutical Sciences and Research University, New Delhi, India
| | - Suruchi Aggarwal
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; Department of Molecular Biology and Biotechnology, Cotton University, Guwahati, Assam, India
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India.
| |
Collapse
|
30
|
Huang Q, Wang J, Zhang X, Guo M, Yu G. IsoDA: Isoform-Disease Association Prediction by Multiomics Data Fusion. J Comput Biol 2021; 28:804-819. [PMID: 33826865 DOI: 10.1089/cmb.2020.0626] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
A gene can be spliced into different isoforms by alternative splicing, which contributes to the functional diversity of protein species. Computational prediction of gene-disease associations (GDAs) has been studied for decades. However, the process of identifying the isoform-disease associations (IDAs) at a large scale is rarely explored, which can decipher the pathology at a more granular level. The main bottleneck is the lack of IDAs in current databases and the multilevel omics data fusion. To bridge this gap, we propose a computational approach called Isoform-Disease Association prediction by multiomics data fusion (IsoDA) to predict IDAs. Based on the relationship between a gene and its spliced isoforms, IsoDA first introduces a dispatch and aggregation term to dispatch gene-disease associations to individual isoforms, and reversely aggregate these dispatched associations to their hosting genes. At the same time, it fuses the genome, transcriptome, and proteome data by joint matrix factorization to improve the prediction of IDAs. Experimental results show that IsoDA significantly outperforms the related state-of-the-art methods at both the gene level and isoform level. A case study further shows that IsoDA credibly identifies three isoforms spliced from apolipoprotein E, which have individual associations with Alzheimer's disease, and two isoforms spliced from vascular endothelial growth factor A, which have different associations with coronary heart disease. The codes of IsoDA are available at http://mlda.swu.edu.cn/codes.php?name=IsoDA.
Collapse
Affiliation(s)
- Qiuyue Huang
- College of Computer and Information Science, Southwest University, Chongqing, China.,School of Software, Shandong University, Jinan, China
| | - Jun Wang
- School of Software, Shandong University, Jinan, China
| | - Xiangliang Zhang
- Department of Computer Science, Computer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Maozu Guo
- Department of Computer Science, College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing, China.,School of Software, Shandong University, Jinan, China.,Department of Computer Science, Computer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
31
|
He M, Huang C, Liu B, Wang Y, Li J. Factor graph-aggregated heterogeneous network embedding for disease-gene association prediction. BMC Bioinformatics 2021; 22:165. [PMID: 33781206 PMCID: PMC8006390 DOI: 10.1186/s12859-021-04099-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 03/23/2021] [Indexed: 11/18/2022] Open
Abstract
Background Exploring the relationship between disease and gene is of great significance for understanding the pathogenesis of disease and developing corresponding therapeutic measures. The prediction of disease-gene association by computational methods accelerates the process. Results Many existing methods cannot fully utilize the multi-dimensional biological entity relationship to predict disease-gene association due to multi-source heterogeneous data. This paper proposes FactorHNE, a factor graph-aggregated heterogeneous network embedding method for disease-gene association prediction, which captures a variety of semantic relationships between the heterogeneous nodes by factorization. It produces different semantic factor graphs and effectively aggregates a variety of semantic relationships, by using end-to-end multi-perspectives loss function to optimize model. Then it produces good nodes embedding to prediction disease-gene association. Conclusions Experimental verification and analysis show FactorHNE has better performance and scalability than the existing models. It also has good interpretability and can be extended to large-scale biomedical network data analysis.
Collapse
Affiliation(s)
- Ming He
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, Guangdong, China
| | - Chen Huang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, Guangdong, China
| | - Bo Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, Guangdong, China.,Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, Guangdong, China.
| |
Collapse
|
32
|
Ogris C, Hu Y, Arloth J, Müller NS. Versatile knowledge guided network inference method for prioritizing key regulatory factors in multi-omics data. Sci Rep 2021; 11:6806. [PMID: 33762588 PMCID: PMC7990936 DOI: 10.1038/s41598-021-85544-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 01/04/2021] [Indexed: 12/28/2022] Open
Abstract
Constantly decreasing costs of high-throughput profiling on many molecular levels generate vast amounts of multi-omics data. Studying one biomedical question on two or more omic levels provides deeper insights into underlying molecular processes or disease pathophysiology. For the majority of multi-omics data projects, the data analysis is performed level-wise, followed by a combined interpretation of results. Hence the full potential of integrated data analysis is not leveraged yet, presumably due to the complexity of the data and the lacking toolsets. We propose a versatile approach, to perform a multi-level fully integrated analysis: The Knowledge guIded Multi-Omics Network inference approach, KiMONo (https://github.com/cellmapslab/kimono). KiMONo performs network inference by using statistical models for combining omics measurements coupled to a powerful knowledge-guided strategy exploiting prior information from existing biological sources. Within the resulting multimodal network, nodes represent features of all input types e.g. variants and genes while edges refer to knowledge-supported and statistically derived associations. In a comprehensive evaluation, we show that our method is robust to noise and exemplify the general applicability to the full spectrum of multi-omics data, demonstrating that KiMONo is a powerful approach towards leveraging the full potential of data sets for detecting biomarker candidates.
Collapse
Affiliation(s)
- Christoph Ogris
- Institute of Computational Biology, Helmholtz Center Munich, Ingolstädter Landstr. 1, 85764, Neuherberg, Germany.
| | - Yue Hu
- Institute of Computational Biology, Helmholtz Center Munich, Ingolstädter Landstr. 1, 85764, Neuherberg, Germany
| | - Janine Arloth
- Institute of Computational Biology, Helmholtz Center Munich, Ingolstädter Landstr. 1, 85764, Neuherberg, Germany.,Department of Translational Psychiatry, Max Planck Institute of Psychiatry, 80804, Munich, Germany
| | - Nikola S Müller
- Institute of Computational Biology, Helmholtz Center Munich, Ingolstädter Landstr. 1, 85764, Neuherberg, Germany.
| |
Collapse
|
33
|
Sun X, Zhang J, Nie Q. Inferring latent temporal progression and regulatory networks from cross-sectional transcriptomic data of cancer samples. PLoS Comput Biol 2021; 17:e1008379. [PMID: 33667222 PMCID: PMC7968745 DOI: 10.1371/journal.pcbi.1008379] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 03/17/2021] [Accepted: 02/15/2021] [Indexed: 12/19/2022] Open
Abstract
Unraveling molecular regulatory networks underlying disease progression is critically important for understanding disease mechanisms and identifying drug targets. The existing methods for inferring gene regulatory networks (GRNs) rely mainly on time-course gene expression data. However, most available omics data from cross-sectional studies of cancer patients often lack sufficient temporal information, leading to a key challenge for GRN inference. Through quantifying the latent progression using random walks-based manifold distance, we propose a latent-temporal progression-based Bayesian method, PROB, for inferring GRNs from the cross-sectional transcriptomic data of tumor samples. The robustness of PROB to the measurement variabilities in the data is mathematically proved and numerically verified. Performance evaluation on real data indicates that PROB outperforms other methods in both pseudotime inference and GRN inference. Applications to bladder cancer and breast cancer demonstrate that our method is effective to identify key regulators of cancer progression or drug targets. The identified ACSS1 is experimentally validated to promote epithelial-to-mesenchymal transition of bladder cancer cells, and the predicted FOXM1-targets interactions are verified and are predictive of relapse in breast cancer. Our study suggests new effective ways to clinical transcriptomic data modeling for characterizing cancer progression and facilitates the translation of regulatory network-based approaches into precision medicine.
Collapse
Affiliation(s)
- Xiaoqiang Sun
- Key Laboratory of Tropical Disease Control, Chinese Ministry of Education; Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- School of Mathematics, Sun Yat-sen University, Guangzhou, China
| | - Ji Zhang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong, China
| | - Qing Nie
- Department of Mathematics and Department of Developmental & Cell Biology, NSF-Simons Center for Multiscale Cell Fate Research, University of California Irvine, Irvine, California, United States of America
| |
Collapse
|
34
|
Joodaki M, Ghadiri N, Maleki Z, Lotfi Shahreza M. A scalable random walk with restart on heterogeneous networks with Apache Spark for ranking disease-related genes through type-II fuzzy data fusion. J Biomed Inform 2021; 115:103688. [PMID: 33545331 DOI: 10.1016/j.jbi.2021.103688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 01/10/2021] [Accepted: 01/23/2021] [Indexed: 12/11/2022]
Abstract
One of the effective missions of biology and medical science is to find disease-related genes. Recent research uses gene/protein networks to find such genes. Due to false positive interactions in these networks, the results often are not accurate and reliable. Integrating multiple gene/protein networks could overcome this drawback, causing a network with fewer false positive interactions. The integration method plays a crucial role in the quality of the constructed network. In this paper, we integrate several sources to build a reliable heterogeneous network, i.e., a network that includes nodes of different types. Due to the different gene/protein sources, four gene-gene similarity networks are constructed first and integrated by applying the type-II fuzzy voter scheme. The resulting gene-gene network is linked to a disease-disease similarity network (as the outcome of integrating four sources) through a two-part disease-gene network. We propose a novel algorithm, namely random walk with restart on the heterogeneous network method with fuzzy fusion (RWRHN-FF). Through running RWRHN-FF over the heterogeneous network, disease-related genes are determined. Experimental results using the leave-one-out cross-validation indicate that RWRHN-FF outperforms existing methods. The proposed algorithm can be applied to find new genes for prostate, breast, gastric, and colon cancers. Since the RWRHN-FF algorithm converges slowly on large heterogeneous networks, we propose a parallel implementation of the RWRHN-FF algorithm on the Apache Spark platform for high-throughput and reliable network inference. Experiments run on heterogeneous networks of different sizes indicate faster convergence compared to other non-distributed modes of implementation.
Collapse
Affiliation(s)
- Mehdi Joodaki
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | - Nasser Ghadiri
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran.
| | - Zeinab Maleki
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | | |
Collapse
|
35
|
Ata SK, Wu M, Fang Y, Ou-Yang L, Kwoh CK, Li XL. Recent advances in network-based methods for disease gene prediction. Brief Bioinform 2020; 22:6023077. [PMID: 33276376 DOI: 10.1093/bib/bbaa303] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/29/2020] [Accepted: 10/10/2020] [Indexed: 01/28/2023] Open
Abstract
Disease-gene association through genome-wide association study (GWAS) is an arduous task for researchers. Investigating single nucleotide polymorphisms that correlate with specific diseases needs statistical analysis of associations. Considering the huge number of possible mutations, in addition to its high cost, another important drawback of GWAS analysis is the large number of false positives. Thus, researchers search for more evidence to cross-check their results through different sources. To provide the researchers with alternative and complementary low-cost disease-gene association evidence, computational approaches come into play. Since molecular networks are able to capture complex interplay among molecules in diseases, they become one of the most extensively used data for disease-gene association prediction. In this survey, we aim to provide a comprehensive and up-to-date review of network-based methods for disease gene prediction. We also conduct an empirical analysis on 14 state-of-the-art methods. To summarize, we first elucidate the task definition for disease gene prediction. Secondly, we categorize existing network-based efforts into network diffusion methods, traditional machine learning methods with handcrafted graph features and graph representation learning methods. Thirdly, an empirical analysis is conducted to evaluate the performance of the selected methods across seven diseases. We also provide distinguishing findings about the discussed methods based on our empirical analysis. Finally, we highlight potential research directions for future studies on disease gene prediction.
Collapse
Affiliation(s)
- Sezin Kircali Ata
- School of Computer Science and Engineering Nanyang Technological University (NTU)
| | - Min Wu
- Institute for Infocomm Research (I2R), A*STAR, Singapore
| | - Yuan Fang
- School of Information Systems, Singapore Management University, Singapore
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen China
| | | | - Xiao-Li Li
- Department head and principal scientist at I2R, A*STAR, Singapore
| |
Collapse
|
36
|
Zanin M, Santos BFR, Antony PMA, Berenguer-Escuder C, Larsen SB, Hanss Z, Barbuti PA, Baumuratov AS, Grossmann D, Capelle CM, Weber J, Balling R, Ollert M, Krüger R, Diederich NJ, He FQ. Mitochondria interaction networks show altered topological patterns in Parkinson's disease. NPJ Syst Biol Appl 2020; 6:38. [PMID: 33173039 PMCID: PMC7655803 DOI: 10.1038/s41540-020-00156-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 10/02/2020] [Indexed: 02/07/2023] Open
Abstract
Mitochondrial dysfunction is linked to pathogenesis of Parkinson's disease (PD). However, individual mitochondria-based analyses do not show a uniform feature in PD patients. Since mitochondria interact with each other, we hypothesize that PD-related features might exist in topological patterns of mitochondria interaction networks (MINs). Here we show that MINs formed nonclassical scale-free supernetworks in colonic ganglia both from healthy controls and PD patients; however, altered network topological patterns were observed in PD patients. These patterns were highly correlated with PD clinical scores and a machine-learning approach based on the MIN features alone accurately distinguished between patients and controls with an area-under-curve value of 0.989. The MINs of midbrain dopaminergic neurons (mDANs) derived from several genetic PD patients also displayed specific changes. CRISPR/CAS9-based genome correction of alpha-synuclein point mutations reversed the changes in MINs of mDANs. Our organelle-interaction network analysis opens another critical dimension for a deeper characterization of various complex diseases with mitochondrial dysregulation.
Collapse
Affiliation(s)
- Massimiliano Zanin
- Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (UIB-CSIC), E-07122, Palma de Mallorca, Spain
- Center for Biomedical Technology, Universidad Politécnica de Madrid, Campus of Montegancedo, E-28223, Pozuelo de Alarcón, Madrid, Spain
| | - Bruno F R Santos
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
- Transversal Translational Medicine, Luxembourg Institute of Health (LIH), 1A-B, rue Thomas Edison, L-1445, Strassen, Luxembourg
- Disease Modeling and Screening Platform (DMSP), Luxembourg Institute of Systems Biomedicine, University of Luxembourg & Luxembourg Institute of Health, 6 avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Paul M A Antony
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
- Disease Modeling and Screening Platform (DMSP), Luxembourg Institute of Systems Biomedicine, University of Luxembourg & Luxembourg Institute of Health, 6 avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Clara Berenguer-Escuder
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Simone B Larsen
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Zoé Hanss
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Peter A Barbuti
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
- Transversal Translational Medicine, Luxembourg Institute of Health (LIH), 1A-B, rue Thomas Edison, L-1445, Strassen, Luxembourg
| | - Aidos S Baumuratov
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Dajana Grossmann
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Christophe M Capelle
- Department of Infection and Immunity, Luxembourg Institute of Health (LIH), 29, rue Henri Koch, L-4354, Esch-sur-Alzette, Luxembourg
| | - Joseph Weber
- Centre Hospitalier de Luxembourg (CHL) 4, Rue Nicolas Ernest Barblé, L-1210, Luxembourg, Luxembourg
| | - Rudi Balling
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Markus Ollert
- Department of Infection and Immunity, Luxembourg Institute of Health (LIH), 29, rue Henri Koch, L-4354, Esch-sur-Alzette, Luxembourg
- Department of Dermatology and Allergy Center, Odense Research Center for Anaphylaxis (ORCA), University of Southern Denmark, 5000C, Odense, Denmark
| | - Rejko Krüger
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
- Transversal Translational Medicine, Luxembourg Institute of Health (LIH), 1A-B, rue Thomas Edison, L-1445, Strassen, Luxembourg
- Centre Hospitalier de Luxembourg (CHL) 4, Rue Nicolas Ernest Barblé, L-1210, Luxembourg, Luxembourg
| | - Nico J Diederich
- Centre Hospitalier de Luxembourg (CHL) 4, Rue Nicolas Ernest Barblé, L-1210, Luxembourg, Luxembourg
| | - Feng Q He
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg.
- Department of Infection and Immunity, Luxembourg Institute of Health (LIH), 29, rue Henri Koch, L-4354, Esch-sur-Alzette, Luxembourg.
- Institute of Medical Microbiology, University Hospital Essen, University Duisburg-Essen, D-45122, Essen, Germany.
| |
Collapse
|
37
|
Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. Bioinformatics 2020; 36:3457-3465. [PMID: 32129827 PMCID: PMC7267831 DOI: 10.1093/bioinformatics/btaa150] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 12/22/2022] Open
Abstract
Background Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. Results In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene’s full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation’s appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. Availability and implementation The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. Contact arjun@msu.edu Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Renming Liu
- Department of Computational Mathematics, Science and Engineering
| | | | | | - Kayla A Johnson
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Arjun Krishnan
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
- To whom correspondence should be addressed.
| |
Collapse
|
38
|
Dozmorov MG, Cresswell KG, Bacanu SA, Craver C, Reimers M, Kendler KS. A method for estimating coherence of molecular mechanisms in major human disease and traits. BMC Bioinformatics 2020; 21:473. [PMID: 33087046 PMCID: PMC7579960 DOI: 10.1186/s12859-020-03821-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 10/15/2020] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Phenotypes such as height and intelligence, are thought to be a product of the collective effects of multiple phenotype-associated genes and interactions among their protein products. High/low degree of interactions is suggestive of coherent/random molecular mechanisms, respectively. Comparing the degree of interactions may help to better understand the coherence of phenotype-specific molecular mechanisms and the potential for therapeutic intervention. However, direct comparison of the degree of interactions is difficult due to different sizes and configurations of phenotype-associated gene networks. METHODS We introduce a metric for measuring coherence of molecular-interaction networks as a slope of internal versus external distributions of the degree of interactions. The internal degree distribution is defined by interaction counts within a phenotype-specific gene network, while the external degree distribution counts interactions with other genes in the whole protein-protein interaction (PPI) network. We present a novel method for normalizing the coherence estimates, making them directly comparable. RESULTS Using STRING and BioGrid PPI databases, we compared the coherence of 116 phenotype-associated gene sets from GWAScatalog against size-matched KEGG pathways (the reference for high coherence) and random networks (the lower limit of coherence). We observed a range of coherence estimates for each category of phenotypes. Metabolic traits and diseases were the most coherent, while psychiatric disorders and intelligence-related traits were the least coherent. We demonstrate that coherence and modularity measures capture distinct network properties. CONCLUSIONS We present a general-purpose method for estimating and comparing the coherence of molecular-interaction gene networks that accounts for the network size and shape differences. Our results highlight gaps in our current knowledge of genetics and molecular mechanisms of complex phenotypes and suggest priorities for future GWASs.
Collapse
Affiliation(s)
- Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA USA
| | - Kellen G. Cresswell
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA USA
| | - Silviu-Alin Bacanu
- Virginia Institute for Psychiatric and Behavior Genetics and the Department of Psychiatry, Virginia Commonwealth University, Richmond, VA USA
| | - Carl Craver
- Philosophy-Neuroscience-Psychology Program, Washington University in St. Louis, St. Louis, MO USA
| | - Mark Reimers
- Department Physiology, Michigan State University, East Lansing, MI USA
- Department Biomedical Engineering, Michigan State University, East Lansing, MI USA
| | - Kenneth S. Kendler
- Virginia Institute for Psychiatric and Behavior Genetics and the Department of Psychiatry, Virginia Commonwealth University, Richmond, VA USA
| |
Collapse
|
39
|
Fernando PC, Mabee PM, Zeng E. Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities. BMC Bioinformatics 2020; 21:442. [PMID: 33028186 PMCID: PMC7542696 DOI: 10.1186/s12859-020-03773-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Accepted: 09/22/2020] [Indexed: 01/04/2023] Open
Abstract
Background Identification of genes responsible for anatomical entities is a major requirement in many fields including developmental biology, medicine, and agriculture. Current wet lab techniques used for this purpose, such as gene knockout, are high in resource and time consumption. Protein–protein interaction (PPI) networks are frequently used to predict disease genes for humans and gene candidates for molecular functions, but they are rarely used to predict genes for anatomical entities. Moreover, PPI networks suffer from network quality issues, which can be a limitation for their usage in predicting candidate genes. Therefore, we developed an integrative framework to improve the candidate gene prediction accuracy for anatomical entities by combining existing experimental knowledge about gene-anatomical entity relationships with PPI networks using anatomy ontology annotations. We hypothesized that this integration improves the quality of the PPI networks by reducing the number of false positive and false negative interactions and is better optimized to predict candidate genes for anatomical entities. We used existing Uberon anatomical entity annotations for zebrafish and mouse genes to construct gene networks by calculating semantic similarity between the genes. These anatomy-based gene networks were semantic networks, as they were constructed based on the anatomy ontology annotations that were obtained from the experimental data in the literature. We integrated these anatomy-based gene networks with mouse and zebrafish PPI networks retrieved from the STRING database and compared the performance of their network-based candidate gene predictions. Results According to evaluations of candidate gene prediction performance tested under four different semantic similarity calculation methods (Lin, Resnik, Schlicker, and Wang), the integrated networks, which were semantically improved PPI networks, showed better performances by having higher area under the curve values for receiver operating characteristic and precision-recall curves than PPI networks for both zebrafish and mouse. Conclusion Integration of existing experimental knowledge about gene-anatomical entity relationships with PPI networks via anatomy ontology improved the candidate gene prediction accuracy and optimized them for predicting candidate genes for anatomical entities.
Collapse
Affiliation(s)
- Pasan C Fernando
- Department of Biology, University of South Dakota, Vermillion, SD, USA.
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA.,National Ecological Observatory Network, Battelle Memorial Institute, 1685 38th St., Suite 100, Boulder, CO, 80301, USA
| | - Erliang Zeng
- Division of Biostatistics and Computational Biology, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Preventive and Community Dentistry, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA, USA. .,Department of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
40
|
Ratnakumar A, Weinhold N, Mar JC, Riaz N. Protein-Protein interactions uncover candidate 'core genes' within omnigenic disease networks. PLoS Genet 2020; 16:e1008903. [PMID: 32678846 PMCID: PMC7390454 DOI: 10.1371/journal.pgen.1008903] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 07/29/2020] [Accepted: 06/01/2020] [Indexed: 01/09/2023] Open
Abstract
Genome wide association studies (GWAS) of human diseases have generally identified many loci associated with risk with relatively small effect sizes. The omnigenic model attempts to explain this observation by suggesting that diseases can be thought of as networks, where genes with direct involvement in disease-relevant biological pathways are named ‘core genes’, while peripheral genes influence disease risk via their interactions or regulatory effects on core genes. Here, we demonstrate a method for identifying candidate core genes solely from genes in or near disease-associated SNPs (GWAS hits) in conjunction with protein-protein interaction network data. Applied to 1,381 GWAS studies from 5 ancestries, we identify a total of 1,865 candidate core genes in 343 GWAS studies. Our analysis identifies several well-known disease-related genes that are not identified by GWAS, including BRCA1 in Breast Cancer, Amyloid Precursor Protein (APP) in Alzheimer’s Disease, INS in A1C measurement and Type 2 Diabetes, and PCSK9 in LDL cholesterol, amongst others. Notably candidate core genes are preferentially enriched for disease relevance over GWAS hits and are enriched for both Clinvar pathogenic variants and known drug targets—consistent with the predictions of the omnigenic model. We subsequently use parent term annotations provided by the GWAS catalog, to merge related GWAS studies and identify candidate core genes in over-arching disease processes such as cancer–where we identify 109 candidate core genes. A recent theory suggests that only a small number of genes underpin the biology of a disease, these genes are called ‘core genes’, and for most diseases, these core genes remain unknown. The suggested methods for finding them requires complex and expensive experiments. We reasoned that if we merge currently available datasets in smart ways, we may be able to uncover these ‘core genes’. Our method finds “hub” proteins by merging lists of genes previously linked with disease to information on how proteins interact with each other. We found that many of these hub proteins have central roles in disease, such as insulin for both A1C measurement and Type 2 Diabetes, BRCA1 in Breast cancer, and Amyloid Precursor Protein in Alzheimer’s Disease. We think these ‘hub’ proteins are candidate ‘core genes’, and offer our method as a way to find ‘core genes’ by utilizing publicly available reference datasets.
Collapse
Affiliation(s)
- Abhirami Ratnakumar
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
- * E-mail:
| | - Nils Weinhold
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
| | - Jessica C. Mar
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Australia
| | - Nadeem Riaz
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
| |
Collapse
|
41
|
Le DH. Machine learning-based approaches for disease gene prediction. Brief Funct Genomics 2020; 19:350-363. [PMID: 32567652 DOI: 10.1093/bfgp/elaa013] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/30/2020] [Accepted: 05/09/2020] [Indexed: 12/20/2022] Open
Abstract
Disease gene prediction is an essential issue in biomedical research. In the early days, annotation-based approaches were proposed for this problem. With the development of high-throughput technologies, interaction data between genes/proteins have grown quickly and covered almost genome and proteome; thus, network-based methods for the problem become prominent. In parallel, machine learning techniques, which formulate the problem as a classification, have also been proposed. Here, we firstly show a roadmap of the machine learning-based methods for the disease gene prediction. In the beginning, the problem was usually approached using a binary classification, where positive and negative training sample sets are comprised of disease genes and non-disease genes, respectively. The disease genes are ones known to be associated with diseases; meanwhile, non-disease genes were randomly selected from those not yet known to be associated with diseases. However, the later may contain unknown disease genes. To overcome this uncertainty of defining the non-disease genes, more realistic approaches have been proposed for the problem, such as unary and semi-supervised classification. Recently, more advanced methods, including ensemble learning, matrix factorization and deep learning, have been proposed for the problem. Secondly, 12 representative machine learning-based methods for the disease gene prediction were examined and compared in terms of prediction performance and running time. Finally, their advantages, disadvantages, interpretability and trust were also analyzed and discussed.
Collapse
Affiliation(s)
- Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam
| |
Collapse
|
42
|
Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. BIOINFORMATICS (OXFORD, ENGLAND) 2020; 36:3457-3465. [PMID: 32129827 DOI: 10.1101/721423] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 05/26/2023]
Abstract
BACKGROUND Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. RESULTS In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene's full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation's appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. AVAILABILITY AND IMPLEMENTATION The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. CONTACT arjun@msu.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Renming Liu
- Department of Computational Mathematics, Science and Engineering
| | | | | | - Kayla A Johnson
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Arjun Krishnan
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
43
|
Peng J, Zhu L, Wang Y, Chen J. Mining Relationships among Multiple Entities in Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:769-776. [PMID: 30872239 DOI: 10.1109/tcbb.2019.2904965] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying topological relationships among multiple entities in biological networks is critical towards the understanding of the organizational principles of network functionality. Theoretically, this problem can be solved using minimum Steiner tree (MSTT) algorithms. However, due to large network size, it remains to be computationally challenging, and the predictive value of multi-entity topological relationships is still unclear. We present a novel solution called Cluster-based Steiner Tree Miner (CST-Miner) to instantly identify multi-entity topological relationships in biological networks. Given a list of user-specific entities, CST-Miner decomposes a biological network into nested cluster-based subgraphs, on which multiple minimum Steiner trees are identified. By merging all of them into a minimum cost tree, the optimal topological relationships among all the user-specific entities are revealed. Experimental results showed that CST-Miner can finish in nearly log-linear time and the tree constructed by CST-Miner is close to the global minimum.
Collapse
|
44
|
Halu A, Liu S, Baek SH, Hobbs BD, Hunninghake GM, Cho MH, Silverman EK, Sharma A. Exploring the cross-phenotype network region of disease modules reveals concordant and discordant pathways between chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis. Hum Mol Genet 2020; 28:2352-2364. [PMID: 30997486 DOI: 10.1093/hmg/ddz069] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Revised: 03/12/2019] [Accepted: 03/23/2019] [Indexed: 12/16/2022] Open
Abstract
Chronic obstructive pulmonary disease (COPD) and idiopathic pulmonary fibrosis (IPF) are two pathologically distinct chronic lung diseases that are associated with cigarette smoking. Genetic studies have identified shared loci for COPD and IPF, including several loci with opposite directions of effect. The existence of additional shared genetic loci, as well as potential shared pathobiological mechanisms between the two diseases at the molecular level, remains to be explored. Taking a network-based approach, we built disease modules for COPD and IPF using genome-wide association studies-implicated genes. The two disease modules displayed strong disease signals in an independent gene expression data set of COPD and IPF lung tissue and showed statistically significant overlap and network proximity, sharing 19 genes, including ARHGAP12 and BCHE. To uncover pathways at the intersection of COPD and IPF, we developed a metric, NetPathScore, which prioritizes the pathways of a disease by their network overlap with another disease. Applying NetPathScore to the COPD and IPF disease modules enabled the determination of concordant and discordant pathways between these diseases. Concordant pathways between COPD and IPF included extracellular matrix remodeling, Mitogen-activated protein kinase (MAPK) signaling and ALK pathways, whereas discordant pathways included advanced glycosylation end product receptor signaling and telomere maintenance and extension pathways. Overall, our findings reveal shared molecular interaction regions between COPD and IPF and shed light on the congruent and incongruent biological processes lying at the intersection of these two complex diseases.
Collapse
Affiliation(s)
- Arda Halu
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Center for Interdisciplinary Cardiovascular Sciences, Cardiovascular Division, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Shikang Liu
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
| | - Seung Han Baek
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Brian D Hobbs
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Division of Pulmonary and Critical Care, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Gary M Hunninghake
- Division of Pulmonary and Critical Care, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Michael H Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Division of Pulmonary and Critical Care, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Division of Pulmonary and Critical Care, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Amitabh Sharma
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
45
|
Ljubic B, Pavlovski M, Alshehri J, Roychoudhury S, Bajic V, Van Neste C, Obradovic Z. Comorbidity network analysis and genetics of colorectal cancer. INFORMATICS IN MEDICINE UNLOCKED 2020. [DOI: 10.1016/j.imu.2020.100492] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
|
46
|
Su Y, Zhu H, Zhang L, Zhang X. Identifying Disease Modules Based on Connectivity and Semantic Similarities. COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE 2020:26-40. [DOI: 10.1007/978-981-15-3415-7_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
47
|
Cowman T, Coşkun M, Grama A, Koyutürk M. Integrated querying and version control of context-specific biological networks. Database (Oxford) 2020; 2020:baaa018. [PMID: 32294194 PMCID: PMC7158887 DOI: 10.1093/database/baaa018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 01/13/2020] [Accepted: 02/21/2020] [Indexed: 01/26/2023]
Abstract
MOTIVATION Biomolecular data stored in public databases is increasingly specialized to organisms, context/pathology and tissue type, potentially resulting in significant overhead for analyses. These networks are often specializations of generic interaction sets, presenting opportunities for reducing storage and computational cost. Therefore, it is desirable to develop effective compression and storage techniques, along with efficient algorithms and a flexible query interface capable of operating on compressed data structures. Current graph databases offer varying levels of support for network integration. However, these solutions do not provide efficient methods for the storage and querying of versioned networks. RESULTS We present VerTIoN, a framework consisting of novel data structures and associated query mechanisms for integrated querying of versioned context-specific biological networks. As a use case for our framework, we study network proximity queries in which the user can select and compose a combination of tissue-specific and generic networks. Using our compressed version tree data structure, in conjunction with state-of-the-art numerical techniques, we demonstrate real-time querying of large network databases. CONCLUSION Our results show that it is possible to support flexible queries defined on heterogeneous networks composed at query time while drastically reducing response time for multiple simultaneous queries. The flexibility offered by VerTIoN in composing integrated network versions opens significant new avenues for the utilization of ever increasing volume of context-specific network data in a broad range of biomedical applications. AVAILABILITY AND IMPLEMENTATION VerTIoN is implemented as a C++ library and is available at http://compbio.case.edu/omics/software/vertion and https://github.com/tjcowman/vertion. CONTACT tyler.cowman@case.edu.
Collapse
Affiliation(s)
- Tyler Cowman
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Mustafa Coşkun
- Department of Computer Engineering, Abdullah Gül University, Kayseri 38080, Turkey
| | - Ananth Grama
- Department of Computer Science, Purdue University, West Lafayette, IN 47906, USA
| | - Mehmet Koyutürk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, USA
- Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
48
|
Isoform-Disease Association Prediction by Data Fusion. BIOINFORMATICS RESEARCH AND APPLICATIONS 2020. [DOI: 10.1007/978-3-030-57821-3_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
49
|
Lu Y, Fang Z, Zeng T, Li M, Chen Q, Zhang H, Zhou Q, Hu Y, Chen L, Su S. Chronic hepatitis B: dynamic change in Traditional Chinese Medicine syndrome by dynamic network biomarkers. Chin Med 2019; 14:52. [PMID: 31768187 PMCID: PMC6873721 DOI: 10.1186/s13020-019-0275-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 11/04/2019] [Indexed: 02/06/2023] Open
Abstract
Background In traditional Chinese medicine (TCM) clinical practice, TCM syndromes help to understand human homeostasis and guide individualized treatment. However, the TCM syndrome changes with disease progression, of which the scientific basis and mechanism remain unclear. Methods To demonstrate the underlying mechanism of dynamic changes in the TCM syndrome, we applied a dynamic network biomarker (DNB) algorithm to obtain the DNBs of changes in the TCM syndrome, based on the transcriptomic data of patients with chronic hepatitis B and typical TCM syndromes, including healthy controls and patients with liver-gallbladder dampness-heat syndrome (LGDHS), liver-depression spleen-deficiency syndrome (LDSDS), and liver-kidney yin-deficiency syndrome (LKYDS). The DNB model exploits collective fluctuations and correlations of the observed genes, then diagnoses the critical state. Results Our results showed that the DNBs of TCM syndromes were comprised of 52 genes and the tipping point occurred at the LDSDS stage. Meanwhile, there were numerous differentially expressed genes between LGDHS and LKYDS, which highlighted the drastic changes before and after the tipping point, implying the 52 DNBs could serve as early-warning signals of the upcoming change in the TCM syndrome. Next, we validated DNBs by cytokine profiling and isobaric tags for relative and absolute quantitation (iTRAQ). The results showed that PLG (plasminogen) and coagulation factor XII (F12) were significantly expressed during the progression of TCM syndrome from LGDHS to LKYDS. Conclusions This study provides a scientific understanding of changes in the TCM syndrome. During this process, the cytokine system was involved all the time. The DNBs PLG and F12 were confirmed to significantly change during TCM-syndrome progression and indicated a potential value of DNBs in auxiliary diagnosis of TCM syndrome in CHB. Trial registration Identifier: NCT03189992. Registered on June 4, 2017. Retrospectively registered (http://www.clinicaltrials.gov)
Collapse
Affiliation(s)
- Yiyu Lu
- 1Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203 China
| | - Zhaoyuan Fang
- 2Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institute of Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
| | - Tao Zeng
- 2Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institute of Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
| | - Meiyi Li
- 5Minhang Branch, Zhongshan Hospital, Fudan University/Institute of Fudan-Minhang Academic Health System, Minhang Hospital, Fudan University, Shanghai, 201199 China
| | - Qilong Chen
- 1Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203 China
| | - Hui Zhang
- 1Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203 China
| | - Qianmei Zhou
- 1Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203 China
| | - Yiyang Hu
- 4Institute of Liver Disease, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203 China
| | - Luonan Chen
- 2Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institute of Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031 China.,3CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223 China
| | - Shibing Su
- 1Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203 China
| |
Collapse
|
50
|
Alshahrani M, Hoehndorf R. Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes. Bioinformatics 2019; 34:i901-i907. [PMID: 30423077 PMCID: PMC6129260 DOI: 10.1093/bioinformatics/bty559] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Motivation In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease’s (or patient’s) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse. Results We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprised of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network. Availability and implementation https://github.com/bio-ontology-research-group/SmuDGE
Collapse
Affiliation(s)
- Mona Alshahrani
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|