1
|
Multiview hyperedge-aware hyper graph embedding learning for multisite, multiatlas fMRI based functional connectivity network analysis. Med Image Anal 2024; 94:103144. [PMID: 38518530 DOI: 10.1016/j.media.2024.103144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 03/17/2024] [Accepted: 03/18/2024] [Indexed: 03/24/2024]
Abstract
Recently, functional magnetic resonance imaging (fMRI) based functional connectivity network (FCN) analysis via graph convolutional networks (GCNs) has shown promise for automated diagnosis of brain diseases by regarding the FCNs as irregular graph-structured data. However, multiview information and site influences of the FCNs in a multisite, multiatlas fMRI scenario have been understudied. In this paper, we propose a Class-consistency and Site-independence Multiview Hyperedge-Aware HyperGraph Embedding Learning (CcSi-MHAHGEL) framework to integrate FCNs constructed on multiple brain atlases in a multisite fMRI study. Specifically, for each subject, we first model brain network as a hypergraph for every brain atlas to characterize high-order relations among multiple vertexes, and then introduce a multiview hyperedge-aware hypergraph convolutional network (HGCN) to extract a multiatlas-based FCN embedding where hyperedge weights are adaptively learned rather than employing the fixed weights precalculated in traditional HGCNs. In addition, we formulate two modules to jointly learn the multiatlas-based FCN embeddings by considering the between-subject associations across classes and sites, respectively, i.e., a class-consistency module to encourage both compactness within every class and separation between classes for promoting discrimination in the embedding space, and a site-independence module to minimize the site dependence of the embeddings for mitigating undesired site influences due to differences in scanning platforms and/or protocols at multiple sites. Finally, the multiatlas-based FCN embeddings are fed into a few fully connected layers followed by the soft-max classifier for diagnosis decision. Extensive experiments on the ABIDE demonstrate the effectiveness of our method for autism spectrum disorder (ASD) identification. Furthermore, our method is interpretable by revealing ASD-relevant brain regions that are biologically significant.
Collapse
|
2
|
Tissue specific tumor-gene link prediction through sampling based GNN using a heterogeneous network. Med Biol Eng Comput 2024:10.1007/s11517-024-03087-y. [PMID: 38635004 DOI: 10.1007/s11517-024-03087-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 03/31/2024] [Indexed: 04/19/2024]
Abstract
A tissue sample is a valuable resource for understanding a patient's symptoms and health status in relation to tumor growth. Recent research seeks to establish a connection between tissue-specific tumor samples and genetic markers (genes). This breakthrough has paved the way for personalized cancer therapies. With this motivation, the proposed model constructs a heterogeneous network based on tumor sample-gene relation data and gene-gene interaction data. This network also incorporates tissue-specific gene expression and primary site-based gene counts as features, enabling tissue-specific predictions. Graph neural networks (GNNs) have proven effective in modeling complex interactions and predicting links within this network. The proposed model has successfully predicted tumor-gene associations by leveraging sampling-based GNNs and link layer embedding. The model's performance metrics, such as AUC-ROC scores, reached approximately 94%, demonstrating the potential of this heterogeneous network in predicting tissue-specific tumor sample-gene links. This paper's findings highlight the importance of tissue-specific associations in cancer research.
Collapse
|
3
|
TransformerG2G: Adaptive time-stepping for learning temporal graph embeddings using transformers. Neural Netw 2024; 172:106086. [PMID: 38159511 DOI: 10.1016/j.neunet.2023.12.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 12/18/2023] [Accepted: 12/22/2023] [Indexed: 01/03/2024]
Abstract
Dynamic graph embedding has emerged as a very effective technique for addressing diverse temporal graph analytic tasks (i.e., link prediction, node classification, recommender systems, anomaly detection, and graph generation) in various applications. Such temporal graphs exhibit heterogeneous transient dynamics, varying time intervals, and highly evolving node features throughout their evolution. Hence, incorporating long-range dependencies from the historical graph context plays a crucial role in accurately learning their temporal dynamics. In this paper, we develop a graph embedding model with uncertainty quantification, TransformerG2G, by exploiting the advanced transformer encoder to first learn intermediate node representations from its current state (t) and previous context (over timestamps [t-1,t-l], l is the length of context). Moreover, we employ two projection layers to generate lower-dimensional multivariate Gaussian distributions as each node's latent embedding at timestamp t. We consider diverse benchmarks with varying levels of "novelty" as measured by the TEA (Temporal Edge Appearance) plots. Our experiments demonstrate that the proposed TransformerG2G model outperforms conventional multi-step methods and our prior work (DynG2G) in terms of both link prediction accuracy and computational efficiency, especially for high degree of novelty. Furthermore, the learned time-dependent attention weights across multiple graph snapshots reveal the development of an automatic adaptive time stepping enabled by the transformer. Importantly, by examining the attention weights, we can uncover temporal dependencies, identify influential elements, and gain insights into the complex interactions within the graph structure. For example, we identified a strong correlation between attention weights and node degree at the various stages of the graph topology evolution.
Collapse
|
4
|
ConTIG: Continuous representation learning on temporal interaction graphs. Neural Netw 2024; 172:106151. [PMID: 38301339 DOI: 10.1016/j.neunet.2024.106151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Revised: 12/15/2023] [Accepted: 01/26/2024] [Indexed: 02/03/2024]
Abstract
Representation learning on temporal interaction graphs (TIG) aims to model complex networks with the dynamic evolution of interactions on a wide range of web and social graph applications. However, most existing works on TIG either (a) rely on discretely updated node embeddings merely when an interaction occurs that fail to capture the continuous evolution of embedding trajectories of nodes, or (b) overlook the rich temporal patterns hidden in the ever-changing graph data that presumably lead to sub-optimal models. In this paper, we propose a two-module framework named ConTIG, a novel representation learning method on TIG that captures the continuous dynamic evolution of node embedding trajectories. With two essential modules, our model exploits three-fold factors in dynamic networks including latest interaction, neighbor features, and inherent characteristics. In the first update module, we employ a continuous inference block to learn the nodes' state trajectories from time-adjacent interaction patterns using ordinary differential equations. In the second transform module, we introduce a self-attention mechanism to predict future node embeddings by aggregating historical temporal interaction information. Experiment results demonstrate the superiority of ConTIG on temporal link prediction, temporal node recommendation, and dynamic node classification tasks of four datasets compared with a range of state-of-the-art baselines, especially for long-interval interaction prediction.
Collapse
|
5
|
The Dynamical Biomarkers in Functional Connectivity of Autism Spectrum Disorder Based on Dynamic Graph Embedding. Interdiscip Sci 2024; 16:141-159. [PMID: 38060171 DOI: 10.1007/s12539-023-00592-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 11/02/2023] [Accepted: 11/02/2023] [Indexed: 12/08/2023]
Abstract
Autism spectrum disorder (ASD) is a neurological and developmental disorder and its early diagnosis is a challenging task. The dynamic brain network (DBN) offers a wealth of information for the diagnosis and treatment of ASD. Mining the spatio-temporal characteristics of DBN is critical for finding dynamic communication across brain regions and, ultimately, identifying the ASD diagnostic biomarker. We proposed the dgEmbed-KNN and the Aggregation-SVM diagnostic models, which use the spatio-temporal information from DBN and interactive information among brain regions represented by dynamic graph embedding. The classification accuracies show that dgEmbed-KNN model performs slightly better than traditional machine learning and deep learning methods, while the Aggregation-SVM model has a very good capacity to diagnose ASD using aggregation brain network connections as features. We discovered over- and under-connections in ASD at the level of dynamic connections, involving brain regions of the postcentral gyrus, the insula, the cerebellum, the caudate nucleus, and the temporal pole. We also found abnormal dynamic interactions associated with ASD within/between the functional subnetworks, including default mode network, visual network, auditory network and saliency network. These can provide potential DBN biomarkers for ASD identification.
Collapse
|
6
|
Graph embedding-based heterogeneous domain adaptation with domain-invariant feature learning and distributional order preserving. Neural Netw 2024; 170:427-440. [PMID: 38035485 DOI: 10.1016/j.neunet.2023.11.048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 09/04/2023] [Accepted: 11/22/2023] [Indexed: 12/02/2023]
Abstract
Heterogeneous domain adaptation (HDA) methods leverage prior knowledge from the source domain to train models for the target domain and address the differences in their feature spaces. However, incorrect alignment of categories and distribution structure disruption may be caused by unlabeled target samples during the domain alignment process for most existing methods, resulting in negative transfer. Additionally, the previous works rarely focus on the robustness and interpretability of the model. To address these issues, we propose a novel Graph embedding-based Heterogeneous domain-Invariant feature learning and Distributional order preserving framework (GHID). Specifically, a bidirectional robust cross-domain alignment graph embedding structure is proposed to globally align two domains, which learns the domain-invariant and discriminative features simultaneously. In addition, the interpretability of the proposed graph structures is demonstrated through two theoretical analyses, which can elucidate the correlation between important samples from a global perspective in heterogeneous domain alignment scenarios. Then, a heterogeneous discriminative distributional order preserving graph embedding structure is designed to preserve the original distribution relationship of each domain to prevent negative transfer. Moreover, the dynamic centroid strategy is incorporated into the graph structures to improve the robustness of the model. Comprehensive experimental results on four benchmarks demonstrate that the proposed method outperforms other state-of-the-art approaches in effectiveness.
Collapse
|
7
|
Graph embedding on mass spectrometry- and sequencing-based biomedical data. BMC Bioinformatics 2024; 25:1. [PMID: 38166530 PMCID: PMC10763173 DOI: 10.1186/s12859-023-05612-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 12/11/2023] [Indexed: 01/04/2024] Open
Abstract
Graph embedding techniques are using deep learning algorithms in data analysis to solve problems of such as node classification, link prediction, community detection, and visualization. Although typically used in the context of guessing friendships in social media, several applications for graph embedding techniques in biomedical data analysis have emerged. While these approaches remain computationally demanding, several developments over the last years facilitate their application to study biomedical data and thus may help advance biological discoveries. Therefore, in this review, we discuss the principles of graph embedding techniques and explore the usefulness for understanding biological network data derived from mass spectrometry and sequencing experiments, the current workhorses of systems biology studies. In particular, we focus on recent examples for characterizing protein-protein interaction networks and predicting novel drug functions.
Collapse
|
8
|
Knowledge Graphs and Their Applications in Drug Discovery. Methods Mol Biol 2024; 2716:203-221. [PMID: 37702941 DOI: 10.1007/978-1-0716-3449-3_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
Knowledge graphs represent information in the form of entities and relationships between those entities. Such a representation has multiple potential applications in drug discovery, including democratizing access to biomedical data, contextualizing or visualizing that data, and generating novel insights through the application of machine learning approaches. Knowledge graphs put data into context and therefore offer the opportunity to generate explainable predictions, which is a key topic in contemporary artificial intelligence. In this chapter, we outline some of the factors that need to be considered when constructing biomedical knowledge graphs, examine recent advances in mining such systems to gain insights for drug discovery, and identify potential future areas for further development.
Collapse
|
9
|
Semi-supervised enhanced discriminative local constraint preserving projection for dimensionality reduction of medical hyperspectral images. Comput Biol Med 2023; 167:107568. [PMID: 37890419 DOI: 10.1016/j.compbiomed.2023.107568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 08/27/2023] [Accepted: 10/10/2023] [Indexed: 10/29/2023]
Abstract
Microscopic hyperspectral images has the advantage of containing rich spatial and spectral information. However, the large number of spectral bands provides a significant amount of spectral features, but also leads to data redundancy and noise, which seriously affect the recognition and classification performance of the images, as well as increasing the requirements for computation and storage. To address this issue, we propose a dimensionality reduction algorithm named enhanced discriminant local constraint preserving projection (EDLCPP). Specifically, the global spectral attention mechanism focuses on important bands, the high discriminability sample selection module measures the discriminability of samples using a modified average neighborhood margin, the graph construction module preserves the local geometric relationship and discriminant information, and the graph embedding module embeds the constructed graphs into a low-dimensional space to obtain the projection matrices. Experimental results on eight cholangiocarcinoma (CCA) hyperspectral images, Bloodcell1-3, and Bloodcell2-2 datasets have demonstrated the effectiveness of the proposed method.
Collapse
|
10
|
KSFinder-a knowledge graph model for link prediction of novel phosphorylated substrates of kinases. PeerJ 2023; 11:e16164. [PMID: 37818330 PMCID: PMC10561642 DOI: 10.7717/peerj.16164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 09/01/2023] [Indexed: 10/12/2023] Open
Abstract
Background Aberrant protein kinase regulation leading to abnormal substrate phosphorylation is associated with several human diseases. Despite the promise of therapies targeting kinases, many human kinases remain understudied. Most existing computational tools predicting phosphorylation cover less than 50% of known human kinases. They utilize local feature selection based on protein sequences, motifs, domains, structures, and/or functions, and do not consider the heterogeneous relationships of the proteins. In this work, we present KSFinder, a tool that predicts kinase-substrate links by capturing the inherent association of proteins in a network comprising 85% of the known human kinases. We also postulate the potential role of two understudied kinases based on their substrate predictions from KSFinder. Methods KSFinder learns the semantic relationships in a phosphoproteome knowledge graph using a knowledge graph embedding algorithm and represents the nodes in low-dimensional vectors. A multilayer perceptron (MLP) classifier is trained to discern kinase-substrate links using the embedded vectors. KSFinder uses a strategic negative generation approach that eliminates biases in entity representation and combines data from experimentally validated non-interacting protein pairs, proteins from different subcellular locations, and random sampling. We assess KSFinder's generalization capability on four different datasets and compare its performance with other state-of-the-art prediction models. We employ KSFinder to predict substrates of 68 "dark" kinases considered understudied by the Illuminating the Druggable Genome program and use our text-mining tool, RLIMS-P along with manual curation, to search for literature evidence for the predictions. In a case study, we performed functional enrichment analysis for two dark kinases - HIPK3 and CAMKK1 using their predicted substrates. Results KSFinder shows improved performance over other kinase-substrate prediction models and generalized prediction ability on different datasets. We identified literature evidence for 17 novel predictions involving an understudied kinase. All of these 17 predictions had a probability score ≥0.7 (nine at >0.9, six at 0.8-0.9, and two at 0.7-0.8). The evaluation of 93,593 negative predictions (probability ≤0.3) identified four false negatives. The top enriched biological processes of HIPK3 substrates relate to the regulation of extracellular matrix and epigenetic gene expression, while CAMKK1 substrates include lipid storage regulation and glucose homeostasis. Conclusions KSFinder outperforms the current kinase-substrate prediction tools with higher kinase coverage. The strategically developed negatives provide a superior generalization ability for KSFinder. We predicted substrates of 432 kinases, 68 of which are understudied, and hypothesized the potential functions of two dark kinases using their predicted substrates.
Collapse
|
11
|
DrugRep-HeSiaGraph: when heterogenous siamese neural network meets knowledge graphs for drug repurposing. BMC Bioinformatics 2023; 24:374. [PMID: 37789314 PMCID: PMC10548718 DOI: 10.1186/s12859-023-05479-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/12/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND Drug repurposing is an approach that holds promise for identifying new therapeutic uses for existing drugs. Recently, knowledge graphs have emerged as significant tools for addressing the challenges of drug repurposing. However, there are still major issues with constructing and embedding knowledge graphs. RESULTS This study proposes a two-step method called DrugRep-HeSiaGraph to address these challenges. The method integrates the drug-disease knowledge graph with the application of a heterogeneous siamese neural network. In the first step, a drug-disease knowledge graph named DDKG-V1 is constructed by defining new relationship types, and then numerical vector representations for the nodes are created using the distributional learning method. In the second step, a heterogeneous siamese neural network called HeSiaNet is applied to enrich the embedding of drugs and diseases by bringing them closer in a new unified latent space. Then, it predicts potential drug candidates for diseases. DrugRep-HeSiaGraph achieves impressive performance metrics, including an AUC-ROC of 91.16%, an AUC-PR of 90.32%, an accuracy of 84.63%, a BS of 0.119, and an MCC of 69.31%. CONCLUSION We demonstrate the effectiveness of the proposed method in identifying potential drugs for COVID-19 as a case study. In addition, this study shows the role of dipeptidyl peptidase 4 (DPP-4) as a potential receptor for SARS-CoV-2 and the effectiveness of DPP-4 inhibitors in facing COVID-19. This highlights the practical application of the model in addressing real-world challenges in the field of drug repurposing. The code and data for DrugRep-HeSiaGraph are publicly available at https://github.com/CBRC-lab/DrugRep-HeSiaGraph .
Collapse
|
12
|
Unsupervised dimensionality reduction of medical hyperspectral imagery in tensor space. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 240:107724. [PMID: 37506600 DOI: 10.1016/j.cmpb.2023.107724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 07/08/2023] [Accepted: 07/16/2023] [Indexed: 07/30/2023]
Abstract
BACKGROUND AND OBJECTIVES Compared with traditional RGB images, medical hyperspectral imagery (HSI) has numerous continuous narrow spectral bands, which can provide rich information for cancer diagnosis. However, the abundant spectral bands also contain a large amount of redundancy information and increase computational complexity. Thus, dimensionality reduction (DR) is essential in HSI analysis. All vector-based DR methods ignore the cubic nature of HSI resulting from vectorization. To overcome the disadvantage of vector-based DR methods, tensor-based techniques have been developed by employing multi-linear algebra. METHODS To fully exploit the structure features of medical HSI and enhance computational efficiency, a novel method called unsupervised dimensionality reduction via tensor-based low-rank collaborative graph embedding (TLCGE) is proposed. TLCGE introduces entropy rate superpixel (ERS) segmentation algorithm to generate superpixels. Then, a low-rank collaborative graph weight matrix is constructed on each superpixel, greatly improving the efficiency and robustness of the proposed method. After that, TLCGE reduces dimensions in tensor space to well preserve intrinsic structure of HSI. RESULTS The proposed TLCGE is tested on cholangiocarcinoma microscopic hyperspectral data sets. To further demonstrate the effectiveness of the proposed algorithm, other machine learning DR methods are used for comparison. Experimental results on cholangiocarcinoma microscopic hyperspectral data sets validate the effectiveness of the proposed TLCGE. CONCLUSIONS The proposed TLCGE is a tensor-based DR method, which can maintain the intrinsic 3-D data structure of medical HSI. By imposing the low-rank and sparse constraints on the objective function, the proposed TLCGE can fully explore the local and global structures within each superpixel. The computational efficiency of the proposed TLCGE is better than other tensor-based DR methods, which can be used as a preprocessing step in real medical HSI classification or segmentation.
Collapse
|
13
|
Graph embedding-based link prediction for literature-based discovery in Alzheimer's Disease. J Biomed Inform 2023; 145:104464. [PMID: 37541406 DOI: 10.1016/j.jbi.2023.104464] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 07/29/2023] [Accepted: 07/30/2023] [Indexed: 08/06/2023]
Abstract
OBJECTIVE We explore the framing of literature-based discovery (LBD) as link prediction and graph embedding learning, with Alzheimer's Disease (AD) as our focus disease context. The key link prediction setting of prediction window length is specifically examined in the context of a time-sliced evaluation methodology. METHODS We propose a four-stage approach to explore literature-based discovery for Alzheimer's Disease, creating and analyzing a knowledge graph tailored to the AD context, and predicting and evaluating new knowledge based on time-sliced link prediction. The first stage is to collect an AD-specific corpus. The second stage involves constructing an AD knowledge graph with identified AD-specific concepts and relations from the corpus. In the third stage, 20 pairs of training and testing datasets are constructed with the time-slicing methodology. Finally, we infer new knowledge with graph embedding-based link prediction methods. We compare different link prediction methods in this context. The impact of limiting prediction evaluation of LBD models in the context of short-term and longer-term knowledge evolution for Alzheimer's Disease is assessed. RESULTS We constructed an AD corpus of over 16 k papers published in 1977-2021, and automatically annotated it with concepts and relations covering 11 AD-specific semantic entity types. The knowledge graph of Alzheimer's Disease derived from this resource consisted of ∼11 k nodes and ∼394 k edges, among which 34% were genotype-phenotype relationships, 57% were genotype-genotype relationships, and 9% were phenotype-phenotype relationships. A Structural Deep Network Embedding (SDNE) model consistently showed the best performance in terms of returning the most confident set of link predictions as time progresses over 20 years. A huge improvement in model performance was observed when changing the link prediction evaluation setting to consider a more distant future, reflecting the time required for knowledge accumulation. CONCLUSION Neural network graph-embedding link prediction methods show promise for the literature-based discovery context, although the prediction setting is extremely challenging, with graph densities of less than 1%. Varying prediction window length on the time-sliced evaluation methodology leads to hugely different results and interpretations of LBD studies. Our approach can be generalized to enable knowledge discovery for other diseases. AVAILABILITY Code, AD ontology, and data are available at https://github.com/READ-BioMed/readbiomed-lbd.
Collapse
|
14
|
Applying Joint Graph Embedding to Study Alzheimer's Neurodegeneration Patterns in Volumetric Data. Neuroinformatics 2023; 21:601-614. [PMID: 37314682 PMCID: PMC10406695 DOI: 10.1007/s12021-023-09634-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/20/2023] [Indexed: 06/15/2023]
Abstract
Neurodegeneration measured through volumetry in MRI is recognized as a potential Alzheimer's Disease (AD) biomarker, but its utility is limited by lack of specificity. Quantifying spatial patterns of neurodegeneration on a whole brain scale rather than locally may help address this. In this work, we turn to network based analyses and extend a graph embedding algorithm to study morphometric connectivity from volume-change correlations measured with structural MRI on the timescale of years. We model our data with the multiple random eigengraphs framework, as well as modify and implement a multigraph embedding algorithm proposed earlier to estimate a low dimensional embedding of the networks. Our version of the algorithm guarantees meaningful finite-sample results and estimates maximum likelihood edge probabilities from population-specific network modes and subject-specific loadings. Furthermore, we propose and implement a novel statistical testing procedure to analyze group differences after accounting for confounders and locate significant structures during AD neurodegeneration. Family-wise error rate is controlled at 5% using permutation testing on the maximum statistic. We show that results from our analysis reveal networks dominated by known structures associated to AD neurodegeneration, indicating the framework has promise for studying AD. Furthermore, we find network-structure tuples that are not found with traditional methods in the field.
Collapse
|
15
|
A fuzzy-based framework for diagnosing esophageal motility disorder using high-resolution manometry. J Biomed Inform 2023; 141:104355. [PMID: 37023842 DOI: 10.1016/j.jbi.2023.104355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 03/05/2023] [Accepted: 03/30/2023] [Indexed: 04/08/2023]
Abstract
In recent years, the high-resolution manometry (HRM) technique has been increasingly used to study esophageal and colonic pressurization and has become a standard routine for discovering mobility disorders. In addition to evolving guidelines for the interpretation of HRM like Chicago standard, some complexities, such as the dependency of normative reference values on the recording device and other external variables, still remain for medical professions. In this study, a decision support framework is developed to aid the diagnosis of esophageal mobility disorders based on HRM data. To abstract HRM data, Spearman correlation is employed to model the spatio-temporal dependencies of pressure values of HRM components and convolutional graph neural networks are then utilized to embed relation graphs to the features vector. In the decision-making stage, a novel Expert per Class Fuzzy Classifier (EPC-FC) is presented that employs the ensemble structure and contains expertized sub-classifiers for recognizing a specific disorder. Training sub-classifiers using the negative correlation learning method makes the EPC-FC highly generalizable. Meanwhile, separating the sub-classifiers of each class gives flexibility and interpretability to the structure. The suggested framework is evaluated on a dataset of 67 patients in 5 different classes recorded in Shariati Hospital. The average accuracy of 78.03% for a single swallow and 92.54% for subject-level is achieved for distinguishing mobility disorders. Moreover, compared with the other studies, the presented framework has an outstanding performance considering that it imposes no limits on the type of classes or HRM data. On the other hand, the EPC-FC outperforms other comparative classifiers such as SVM and AdaBoost not only in HRM diagnosis but also on other benchmark classification problems.
Collapse
|
16
|
Knowledge Graphs: Opportunities and Challenges. Artif Intell Rev 2023; 56:1-32. [PMID: 37362886 PMCID: PMC10068207 DOI: 10.1007/s10462-023-10465-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/09/2023] [Indexed: 04/05/2023]
Abstract
With the explosive growth of artificial intelligence (AI) and big data, it has become vitally important to organize and represent the enormous volume of knowledge appropriately. As graph data, knowledge graphs accumulate and convey knowledge of the real world. It has been well-recognized that knowledge graphs effectively represent complex information; hence, they rapidly gain the attention of academia and industry in recent years. Thus to develop a deeper understanding of knowledge graphs, this paper presents a systematic overview of this field. Specifically, we focus on the opportunities and challenges of knowledge graphs. We first review the opportunities of knowledge graphs in terms of two aspects: (1) AI systems built upon knowledge graphs; (2) potential application fields of knowledge graphs. Then, we thoroughly discuss severe technical challenges in this field, such as knowledge graph embeddings, knowledge acquisition, knowledge graph completion, knowledge fusion, and knowledge reasoning. We expect that this survey will shed new light on future research and the development of knowledge graphs.
Collapse
|
17
|
Topological feature generation for link prediction in biological networks. PeerJ 2023; 11:e15313. [PMID: 37187525 PMCID: PMC10178302 DOI: 10.7717/peerj.15313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 04/06/2023] [Indexed: 05/17/2023] Open
Abstract
Graph or network embedding is a powerful method for extracting missing or potential information from interactions between nodes in biological networks. Graph embedding methods learn representations of nodes and interactions in a graph with low-dimensional vectors, which facilitates research to predict potential interactions in networks. However, most graph embedding methods suffer from high computational costs in the form of high computational complexity of the embedding methods and learning times of the classifier, as well as the high dimensionality of complex biological networks. To address these challenges, in this study, we use the Chopper algorithm as an alternative approach to graph embedding, which accelerates the iterative processes and thus reduces the running time of the iterative algorithms for three different (nervous system, blood, heart) undirected protein-protein interaction (PPI) networks. Due to the high dimensionality of the matrix obtained after the embedding process, the data are transformed into a smaller representation by applying feature regularization techniques. We evaluated the performance of the proposed method by comparing it with state-of-the-art methods. Extensive experiments demonstrate that the proposed approach reduces the learning time of the classifier and performs better in link prediction. We have also shown that the proposed embedding method is faster than state-of-the-art methods on three different PPI datasets.
Collapse
|
18
|
A novel hybrid framework for metabolic pathways prediction based on the graph attention network. BMC Bioinformatics 2022; 23:329. [PMID: 36171550 PMCID: PMC9520805 DOI: 10.1186/s12859-022-04856-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 07/25/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Making clear what kinds of metabolic pathways a drug compound involves in can help researchers understand how the drug is absorbed, distributed, metabolized, and excreted. The characteristics of a compound such as structure, composition and so on directly determine the metabolic pathways it participates in. METHODS We developed a novel hybrid framework based on the graph attention network (GAT) to predict the metabolic pathway classes that a compound involves in, named HFGAT, by making use of its global and local characteristics. The framework mainly consists of a two-branch feature extracting layer and a fully connected (FC) layer. In the two-branch feature extracting layer, one branch is responsible to extract global features of the compound; and the other branch introduces a GAT consisting of two graph attention layers to extract local structural features of the compound. Both the global and the local features of the compound are then integrated into the FC layer which outputs the predicted result of metabolic pathway categories that the compound belongs to. RESULTS We compared the multi-class classification performance of HFGAT with six other representative methods, including five classic machine learning methods and one graph convolutional network (GCN) based deep learning method, on the benchmark dataset containing 6999 compounds belonging to 11 pathway categories. The results showed that the deep learning-based methods (HFGAT, GCN-based method) outperformed the traditional machine learning methods in the prediction of metabolic pathways and our proposed HFGAT method performed better than the GCN-based method. Moreover, HFGAT achieved higher [Formula: see text] scores on 8 of 11 classes than the GCN-based method. CONCLUSIONS Our proposed HFGAT makes use of both the global and local information of the compounds to predict their metabolic pathway categories and has achieved a significant performance. Compared with the GCN model, the introduction of the GAT can help our model pay more attention to substructures of the compound that are useful for the prediction task. The study provided a potential method for drug discovery with all types of metabolic reactions that may be involved in the decomposition and synthesis of pharmaceutical compounds in the organism.
Collapse
|
19
|
Safe medicine recommendation via star interactive enhanced-based transformer model. Comput Biol Med 2021; 141:105159. [PMID: 34971981 DOI: 10.1016/j.compbiomed.2021.105159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 12/18/2021] [Accepted: 12/18/2021] [Indexed: 11/16/2022]
Abstract
With the rapid development of electronic medical records (EMRs), most existing medicine recommendation systems based on EMRs explore knowledge from the diagnosis history to help doctors prescribe medication correctly. However, due to the limitations of the EMRs' content, recommendation systems cannot explicitly reflect relevant medical data, such as drug interactions. In recent years, medicine recommendation approaches based on medical knowledge graphs and graph neural networks have been proposed, and the methods based on the Transformer model have been widely used in medicine recommendation systems. Transformer-based medicine recommendation approaches are readily applicable to inductive problems. Unfortunately, traditional Transformer-based medicine recommendation approaches require complex computing power and suffer information loss among the multi-heads in Transformer model, which causes poor performance. At the same time, these approaches have rarely considered the side effects of drug interaction in traditional medical recommendation approaches. To overcome the drawbacks of the current medicine recommendation approaches, we propose a Star Interactive Enhanced-based Transformer (SIET) model. It first constructs a high-quality heterogeneous graph by bridging EMR (MIMIC-III) and a medical knowledge graph (ICD-9 ontology and DrugBank). Then, based on the constructed heterogeneous graph, it extracts a disease homogeneous graph, a medicine homogeneous graph, and a negative factors homogeneous graph to get auxiliary information of disease or drug (named enhanced neighbors). These are fed into the SIET model in conjunction with the relevant information in the EMRs to obtain representations of diseases and drugs. It finally generates the recommended drug list by calculating the cosine similarity between disease combination representations and drug combination representations. Extensive experiments on the MIMIC-III, DrugBank, and ICD-9 ontology datasets demonstrate the outstanding performance of our proposed model. Meanwhile, we show that our SIET model outperforms strong baselines on an inductive medicine recommendation task.
Collapse
|
20
|
Abuse detection in healthcare insurance with disease-treatment network embedding. J Biomed Inform 2021; 123:103936. [PMID: 34670175 DOI: 10.1016/j.jbi.2021.103936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 09/14/2021] [Accepted: 10/12/2021] [Indexed: 10/20/2022]
Abstract
Abuse in healthcare insurance refers to a medical service or practice inconsistent with the generally accepted sound fiscal practices, such as overtreatment or overcharging. These types of abuses may lead to prescriptions that do not meet the criteria for medical stability. On the other hand, abuse may incur unnecessary costs by deliberately executing gratuitous treatments. In efforts to detect and prevent abuse, insurance companies hire medical professionals to manually examine the legitimacy of claim filings. It is, however, very costly in terms of labor and time to review all of the claims given the exploding amount of filings. In this light, there are growing interests for employing data mining techniques to automatically detect abusive claims or providers showing an abnormal billing pattern. Unfortunately, most of these models do not consider the disease-treatment information explicitly. In order for detection models to properly address the issues rising from individual drugs with similar efficacy, it is absolutely essential to account for the relationship between diseases and treatments during the learning process. In this paper, we propose a network-based approach which assesses the relationship between the diseases and treatments when detecting abuse from claim filings. Our proposed model consists of three stages. During the first stage, a disease-treatment network is constructed based on information extracted from the claim filings. Since the association between diseases and treatments is not explicitly expressed on these filings, we infer the disease-treatment relationship by computing the relative risk (RR). Second stage involves selecting the best graph embedding method from several candidates. We select the best method by comparing performances on link prediction. At the final stage, we solve a link prediction problem as a vehicle to detecting overtreatments. If our link prediction model predicts links to be nonexistent for all of the diseases and treatments listed in a given claim, then the claim is classified as an overtreatment case. We test the proposed model using the real-world claim data and showed that the proposed method classifies the treatment well which does not explicitly exist in the training network.
Collapse
|
21
|
Spectral clustering on protein-protein interaction networks via constructing affinity matrix using attributed graph embedding. Comput Biol Med 2021; 138:104933. [PMID: 34655897 DOI: 10.1016/j.compbiomed.2021.104933] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 09/20/2021] [Accepted: 10/07/2021] [Indexed: 02/02/2023]
Abstract
The identification of protein complexes in protein-protein interaction networks is the most fundamental and essential problem for revealing the underlying mechanism of biological processes. However, most existing protein complexes identification methods only consider a network's topology structures, and in doing so, these methods miss the advantage of using nodes' feature information. In protein-protein interaction, both topological structure and node features are essential ingredients for protein complexes. The spectral clustering method utilizes the eigenvalues of the affinity matrix of the data to map to a low-dimensional space. It has attracted much attention in recent years as one of the most efficient algorithms in the subcategory of dimensionality reduction. In this paper, a new version of spectral clustering, named text-associated DeepWalk-Spectral Clustering (TADW-SC), is proposed for attributed networks in which the identified protein complexes have structural cohesiveness and attribute homogeneity. Since the performance of spectral clustering heavily depends on the effectiveness of the affinity matrix, our proposed method will use the text-associated DeepWalk (TADW) to calculate the embedding vectors of proteins. In the following, the affinity matrix will be computed by utilizing the cosine similarity between the two low dimensional vectors, which will be considerable to improve the accuracy of the affinity matrix. Experimental results show that our method performs unexpectedly well in comparison to existing state-of-the-art methods in both real protein network datasets and synthetic networks.
Collapse
|
22
|
A terahertz time-domain super-resolution imaging method using a local-pixel graph neural network for biological products. Anal Chim Acta 2021; 1181:338898. [PMID: 34556238 DOI: 10.1016/j.aca.2021.338898] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 07/27/2021] [Accepted: 07/29/2021] [Indexed: 11/29/2022]
Abstract
The low image acquisition speed of terahertz (THz) time-domain imaging systems limits their application in biological products analysis. In the current study, a local pixel graph neural network was built for THz time-domain imaging super-resolution. The method could be applied to the analysis of any heterogeneous biological products as it only required a small number of sample images for training and particularly it focused on THz feature frequencies. The graph network applied the Fourier transform to graphs extracted from low-resolution (LR) images bringing an invariance of rotation and flip for local pixels, and the network then learnt the relationship between the state of graphs and the corresponding pixels to be reconstructed. With wood cores and seeds as examples, the images of these samples were captured by a THz time-domain imaging system for training and analysed by the method, achieving the root mean square error (RMSE) of pixels of 0.0957 and 0.1061 for the wood core and seed images, respectively. In addition, the reconstructed high-resolution (HR) images, LR images and true HR images at several feature frequencies were also compared in the current study. Results indicated that the method could not only reconstruct the spatial details and the useful signals from high noise signals at high feature frequencies but could also operate super-resolution in both spatial and spectral aspects.
Collapse
|
23
|
A novel link prediction algorithm for protein-protein interaction networks by attributed graph embedding. Comput Biol Med 2021; 137:104772. [PMID: 34450380 DOI: 10.1016/j.compbiomed.2021.104772] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Revised: 07/29/2021] [Accepted: 08/13/2021] [Indexed: 10/20/2022]
Abstract
The prediction of interactions in protein networks is very critical in various biological processes. In recent years, scientists have focused on computational approaches to predict the interactions of proteins. In protein-protein interaction (PPI) networks, each protein is accompanied by various features, including amino acid sequence, subcellular location, and protein domains. Embedding-based methods have been widely applied for many network analysis tasks, such as link prediction. The Deepwalk algorithm is one of the most popular graph embedding methods that capture the network structure using pure random walking. Here in this paper, we treat the protein-protein interaction prediction problem as a link prediction in attributed networks, and we use an attributed embedding approach to predict the interactions between proteins in the PPI network. In particular, the present paper seeks to present a modified version of Deepwalk based on feature selection for solving link prediction in the protein-protein interaction, which will benefit both network structure and protein features. More specifically the feature selection step consists of two distinct parts. First, a set of relevant features are selected from the original feature set, such that the dimensionality of features is reduced. Second, in the selected set of features, each feature is assigned with a weight based on its significance and therefore the contribution of each feature is distinguished from others. In this method, the new random walk model for link prediction will be introduced by integrating network structure and protein features, based on the assumption that two nodes on the network will be linked since they are nearby in the network. In order to justify the proposal, the authors carry out many experiments on protein-protein interaction networks for comparison with the state-of-the-art network embedding methods. The experimental results from the graphs indicate that our proposed approach is more capable compared to other link prediction approaches and increases the accuracy of prediction.
Collapse
|
24
|
Network-based strategies for protein characterization. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2021; 127:217-248. [PMID: 34340768 DOI: 10.1016/bs.apcsb.2021.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Protein structure characterization is fundamental to understand protein properties, such as folding process and protein resistance to thermal stress, up to unveiling organism pathologies (e.g., prion disease). In this chapter, we provide an overview on how the spectral properties of the networks reconstructed from the Protein Contact Map (PCM) can be used to generate informative observables. As a specific case study, we apply two different network approaches to an example protein dataset, for the aim of discriminating protein folding state, and for the reconstruction of protein 3D structure.
Collapse
|
25
|
Learning Drug-Disease-Target Embedding (DDTE) from knowledge graphs to inform drug repurposing hypotheses. J Biomed Inform 2021; 119:103838. [PMID: 34119691 DOI: 10.1016/j.jbi.2021.103838] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 05/10/2021] [Accepted: 06/08/2021] [Indexed: 10/21/2022]
Abstract
We aimed to develop and validate a new graph embedding algorithm for embedding drug-disease-target networks to generate novel drug repurposing hypotheses. Our model denotes drugs, diseases and targets as subjects, predicates and objects, respectively. Each entity is represented by a multidimensional vector and the predicate is regarded as a translation vector from a subject to an object vectors. These vectors are optimized so that when a subject-predicate-object triple represents a known drug-disease-target relationship, the summed vector between the subject and the predicate is to be close to that of the object; otherwise, the summed vector is distant from the object. The DTINet dataset was utilized to test this algorithm and discover unknown links between drugs and diseases. In cross-validation experiments, this new algorithm outperformed the original DTINet model. The MRR (Mean Reciprocal Rank) values of our models were around 0.80 while those of the original model were about 0.70. In addition, we have identified and verified several pairs of new therapeutic relations as well as adverse effect relations that were not recorded in the original DTINet dataset. This approach showed excellent performance, and the predicted drug-disease and drug-side-effect relationships were found to be consistent with literature reports. This novel method can be used to analyze diverse types of emerging biomedical and healthcare-related knowledge graphs (KG).
Collapse
|
26
|
LinkPred: a high performance library for link prediction in complex networks. PeerJ Comput Sci 2021; 7:e521. [PMID: 34084927 PMCID: PMC8157017 DOI: 10.7717/peerj-cs.521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 04/13/2021] [Indexed: 06/12/2023]
Abstract
The problem of determining the likelihood of the existence of a link between two nodes in a network is called link prediction. This is made possible thanks to the existence of a topological structure in most real-life networks. In other words, the topologies of networked systems such as the World Wide Web, the Internet, metabolic networks, and human society are far from random, which implies that partial observations of these networks can be used to infer information about undiscovered interactions. Significant research efforts have been invested into the development of link prediction algorithms, and some researchers have made the implementation of their methods available to the research community. These implementations, however, are often written in different languages and use different modalities of interaction with the user, which hinders their effective use. This paper introduces LinkPred, a high-performance parallel and distributed link prediction library that includes the implementation of the major link prediction algorithms available in the literature. The library can handle networks with up to millions of nodes and edges and offers a unified interface that facilitates the use and comparison of link prediction algorithms by researchers as well as practitioners.
Collapse
|
27
|
Persona2vec: a flexible multi-role representations learning framework for graphs. PeerJ Comput Sci 2021; 7:e439. [PMID: 33834106 PMCID: PMC8022511 DOI: 10.7717/peerj-cs.439] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Accepted: 02/22/2021] [Indexed: 06/12/2023]
Abstract
Graph embedding techniques, which learn low-dimensional representations of a graph, are achieving state-of-the-art performance in many graph mining tasks. Most existing embedding algorithms assign a single vector to each node, implicitly assuming that a single representation is enough to capture all characteristics of the node. However, across many domains, it is common to observe pervasively overlapping community structure, where most nodes belong to multiple communities, playing different roles depending on the contexts. Here, we propose persona2vec, a graph embedding framework that efficiently learns multiple representations of nodes based on their structural contexts. Using link prediction-based evaluation, we show that our framework is significantly faster than the existing state-of-the-art model while achieving better performance.
Collapse
|
28
|
TigeCMN: On exploration of temporal interaction graph embedding via Coupled Memory Neural Networks. Neural Netw 2021; 140:13-26. [PMID: 33743320 DOI: 10.1016/j.neunet.2021.02.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 12/08/2020] [Accepted: 02/12/2021] [Indexed: 10/22/2022]
Abstract
With the increasing demand of mining rich knowledge in graph structured data, graph embedding has become one of the most popular research topics in both academic and industrial communities due to its powerful capability in learning effective representations. The majority of existing work overwhelmingly learn node embeddings in the context of static, plain or attributed, homogeneous graphs. However, many real-world applications frequently involve bipartite graphs with temporal and attributed interaction edges, named temporal interaction graphs. The temporal interactions usually imply different facets of interest and might even evolve over the time, thus putting forward huge challenges in learning effective node representations. Furthermore, most existing graph embedding models try to embed all the information of each node into a single vector representation, which is insufficient to characterize the node's multifaceted properties. In this paper, we propose a novel framework named TigeCMN to learn node representations from a sequence of temporal interactions. Specifically, we devise two coupled memory networks to store and update node embeddings in the external matrices explicitly and dynamically, which forms deep matrix representations and thus could enhance the expressiveness of the node embeddings. Then, we generate node embedding from two parts: a static embedding that encodes its stationary properties and a dynamic embedding induced from memory matrix that models its temporal interaction patterns. We conduct extensive experiments on various real-world datasets covering the tasks of node classification, recommendation and visualization. The experimental results empirically demonstrate that TigeCMN can achieve significant gains compared with recent state-of-the-art baselines.
Collapse
|
29
|
Survey on graph embeddings and their applications to machine learning problems on graphs. PeerJ Comput Sci 2021; 7:e357. [PMID: 33817007 PMCID: PMC7959646 DOI: 10.7717/peerj-cs.357] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 12/18/2020] [Indexed: 05/13/2023]
Abstract
Dealing with relational data always required significant computational resources, domain expertise and task-dependent feature engineering to incorporate structural information into a predictive model. Nowadays, a family of automated graph feature engineering techniques has been proposed in different streams of literature. So-called graph embeddings provide a powerful tool to construct vectorized feature spaces for graphs and their components, such as nodes, edges and subgraphs under preserving inner graph properties. Using the constructed feature spaces, many machine learning problems on graphs can be solved via standard frameworks suitable for vectorized feature representation. Our survey aims to describe the core concepts of graph embeddings and provide several taxonomies for their description. First, we start with the methodological approach and extract three types of graph embedding models based on matrix factorization, random-walks and deep learning approaches. Next, we describe how different types of networks impact the ability of models to incorporate structural and attributed data into a unified embedding. Going further, we perform a thorough evaluation of graph embedding applications to machine learning problems on graphs, among which are node classification, link prediction, clustering, visualization, compression, and a family of the whole graph embedding algorithms suitable for graph classification, similarity and alignment problems. Finally, we overview the existing applications of graph embeddings to computer science domains, formulate open problems and provide experiment results, explaining how different networks properties result in graph embeddings quality in the four classic machine learning problems on graphs, such as node classification, link prediction, clustering and graph visualization. As a result, our survey covers a new rapidly growing field of network feature engineering, presents an in-depth analysis of models based on network types, and overviews a wide range of applications to machine learning problems on graphs.
Collapse
|
30
|
Graph embedding ensemble methods based on the heterogeneous network for lncRNA-miRNA interaction prediction. BMC Genomics 2020; 21:867. [PMID: 33334307 PMCID: PMC7745483 DOI: 10.1186/s12864-020-07238-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Researchers discover LncRNA-miRNA regulatory paradigms modulate gene expression patterns and drive major cellular processes. Identification of lncRNA-miRNA interactions (LMIs) is critical to reveal the mechanism of biological processes and complicated diseases. Because conventional wet experiments are time-consuming, labor-intensive and costly, a few computational methods have been proposed to expedite the identification of lncRNA-miRNA interactions. However, little attention has been paid to fully exploit the structural and topological information of the lncRNA-miRNA interaction network. RESULTS In this paper, we propose novel lncRNA-miRNA prediction methods by using graph embedding and ensemble learning. First, we calculate lncRNA-lncRNA sequence similarity and miRNA-miRNA sequence similarity, and then we combine them with the known lncRNA-miRNA interactions to construct a heterogeneous network. Second, we adopt several graph embedding methods to learn embedded representations of lncRNAs and miRNAs from the heterogeneous network, and construct the ensemble models using two ensemble strategies. For the former, we consider individual graph embedding based models as base predictors and integrate their predictions, and develop a method, named GEEL-PI. For the latter, we construct a deep attention neural network (DANN) to integrate various graph embeddings, and present an ensemble method, named GEEL-FI. The experimental results demonstrate both GEEL-PI and GEEL-FI outperform other state-of-the-art methods. The effectiveness of two ensemble strategies is validated by further experiments. Moreover, the case studies show that GEEL-PI and GEEL-FI can find novel lncRNA-miRNA associations. CONCLUSION The study reveals that graph embedding and ensemble learning based method is efficient for integrating heterogeneous information derived from lncRNA-miRNA interaction network and can achieve better performance on LMI prediction task. In conclusion, GEEL-PI and GEEL-FI are promising for lncRNA-miRNA interaction prediction.
Collapse
|
31
|
ScGSLC: An unsupervised graph similarity learning framework for single-cell RNA-seq data clustering. Comput Biol Chem 2020; 90:107415. [PMID: 33307360 DOI: 10.1016/j.compbiolchem.2020.107415] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 09/30/2020] [Accepted: 10/06/2020] [Indexed: 01/18/2023]
Abstract
Accurate clustering of cells from single-cell RNA sequencing (scRNA-seq) data is an essential step for biological analysis such as putative cell type identification. However, scRNA-seq data has high dimension and high sparsity, which makes traditional clustering methods less effective to reflect the similarity between cells. Since genetic network fundamentally defines the functions of cell and deep learning shows strong advantages in network representation learning, we propose a novel scRNA-seq clustering framework ScGSLC based on graph similarity learning. ScGSLC effectively integrates scRNA-seq data and protein-protein interaction network to a graph. Then graph convolution network is employed by ScGSLC to embedding graph and clustering the cells by the calculated similarity between graphs. Unsupervised clustering results of nine public data sets demonstrate that ScGSLC shows better performance than the state-of-the-art methods.
Collapse
|
32
|
Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval. Neural Netw 2020; 134:143-162. [PMID: 33310483 DOI: 10.1016/j.neunet.2020.11.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 11/10/2020] [Accepted: 11/23/2020] [Indexed: 11/23/2022]
Abstract
Information retrieval among different modalities becomes a significant issue with many promising applications. However, inconsistent feature representation of various multimedia data causes the "heterogeneity gap" among various modalities, which is a challenge in cross-modal retrieval. For bridging the "heterogeneity gap," the popular methods attempt to project the original data into a common representation space, which needs great fitting ability of the model. To address the above issue, we propose a novel Graph Representation Learning (GRL) method for bridging the heterogeneity gap, which does not project the original feature into an aligned representation space but adopts a cross-modal graph to link different modalities. The GRL approach consists of two subnetworks, Feature Transfer Learning Network (FTLN) and Graph Representation Learning Network (GRLN). Firstly, FTLN model finds a latent space for each modality, where the cosine similarity is suitable to describe their similarity. Then, we build a cross-modal graph to reconstruct the original data and their relationships. Finally, we abandon the features in the latent space and turn into embedding the graph vertexes into a common representation space directly. During the process, the proposed Graph Representation Learning method bypasses the most challenging issue by utilizing a cross-modal graph as a bridge to link the "heterogeneity gap" among different modalities. This attempt utilizes a cross-modal graph as an intermediary agent to bridge the "heterogeneity gap" in cross-modal retrieval, which is simple but effective. Extensive experiment results on six widely-used datasets indicate that the proposed GRL outperforms other state-of-the-art cross-modal retrieval methods.
Collapse
|
33
|
gCAnno: a graph-based single cell type annotation method. BMC Genomics 2020; 21:823. [PMID: 33228535 PMCID: PMC7686723 DOI: 10.1186/s12864-020-07223-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 11/10/2020] [Indexed: 08/30/2023] Open
Abstract
Background Current single cell analysis methods annotate cell types at cluster-level rather than ideally at single cell level. Multiple exchangeable clustering methods and many tunable parameters have a substantial impact on the clustering outcome, often leading to incorrect cluster-level annotation or multiple runs of subsequent clustering steps. To address these limitations, methods based on well-annotated reference atlas has been proposed. However, these methods are currently not robust enough to handle datasets with different noise levels or from different platforms. Results Here, we present gCAnno, a graph-based Cell type Annotation method. First, gCAnno constructs cell type-gene bipartite graph and adopts graph embedding to obtain cell type specific genes. Then, naïve Bayes (gCAnno-Bayes) and SVM (gCAnno-SVM) classifiers are built for annotation. We compared the performance of gCAnno to other state-of-art methods on multiple single cell datasets, either with various noise levels or from different platforms. The results showed that gCAnno outperforms other state-of-art methods with higher accuracy and robustness. Conclusions gCAnno is a robust and accurate cell type annotation tool for single cell RNA analysis. The source code of gCAnno is publicly available at https://github.com/xjtu-omics/gCAnno. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-020-07223-4.
Collapse
|
34
|
Predicting MiRNA-disease associations by multiple meta-paths fusion graph embedding model. BMC Bioinformatics 2020; 21:470. [PMID: 33087064 PMCID: PMC7579830 DOI: 10.1186/s12859-020-03765-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 09/17/2020] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Many studies prove that miRNAs have significant roles in diagnosing and treating complex human diseases. However, conventional biological experiments are too costly and time-consuming to identify unconfirmed miRNA-disease associations. Thus, computational models predicting unidentified miRNA-disease pairs in an efficient way are becoming promising research topics. Although existing methods have performed well to reveal unidentified miRNA-disease associations, more work is still needed to improve prediction performance. RESULTS In this work, we present a novel multiple meta-paths fusion graph embedding model to predict unidentified miRNA-disease associations (M2GMDA). Our method takes full advantage of the complex structure and rich semantic information of miRNA-disease interactions in a self-learning way. First, a miRNA-disease heterogeneous network was derived from verified miRNA-disease pairs, miRNA similarity and disease similarity. All meta-path instances connecting miRNAs with diseases were extracted to describe intrinsic information about miRNA-disease interactions. Then, we developed a graph embedding model to predict miRNA-disease associations. The model is composed of linear transformations of miRNAs and diseases, the means encoder of a single meta-path instance, the attention-aware encoder of meta-path type and attention-aware multiple meta-path fusion. We innovatively integrated meta-path instances, meta-path based neighbours, intermediate nodes in meta-paths and more information to strengthen the prediction in our model. In particular, distinct contributions of different meta-path instances and meta-path types were combined with attention mechanisms. The data sets and source code that support the findings of this study are available at https://github.com/dangdangzhang/M2GMDA . CONCLUSIONS M2GMDA achieved AUCs of 0.9323 and 0.9182 in global leave-one-out cross validation and fivefold cross validation with HDMM V2.0. The results showed that our method outperforms other prediction methods. Three kinds of case studies with lung neoplasms, breast neoplasms, prostate neoplasms, pancreatic neoplasms, lymphoma and colorectal neoplasms demonstrated that 47, 50, 49, 48, 50 and 50 out of the top 50 candidate miRNAs predicted by M2GMDA were validated by biological experiments. Therefore, it further confirms the prediction performance of our method.
Collapse
|
35
|
MGAT: Multi-view Graph Attention Networks. Neural Netw 2020; 132:180-189. [PMID: 32911303 DOI: 10.1016/j.neunet.2020.08.021] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 08/04/2020] [Accepted: 08/23/2020] [Indexed: 11/18/2022]
Abstract
Multi-view graph embedding is aimed at learning low-dimensional representations of nodes that capture various relationships in a multi-view network, where each view represents a type of relationship among nodes. Multitudes of existing graph embedding approaches concentrate on single-view networks, that can only characterize one simple type of proximity relationships among objects. However, most of the real-world complex systems possess multiple types of relationships among entities. In this paper, a novel approach of graph embedding for multi-view networks is proposed, named Multi-view Graph Attention Networks (MGAT). We explore an attention-based architecture for learning node representations from each single view, the network parameters of which are constrained by a novel regularization term. In order to collaboratively integrate multiple types of relationships in different views, a view-focused attention method is explored to aggregate the view-wise node representations. We evaluate the proposed algorithm on several real-world datasets, and it demonstrates that the proposed approach outperforms existing state-of-the-art baselines.
Collapse
|
36
|
Distant supervision for medical concept normalization. J Biomed Inform 2020; 109:103522. [PMID: 32783923 PMCID: PMC7415240 DOI: 10.1016/j.jbi.2020.103522] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 06/09/2020] [Accepted: 07/27/2020] [Indexed: 10/28/2022]
Abstract
We consider the task of Medical Concept Normalization (MCN) which aims to map informal medical phrases such as "loosing weight" to formal medical concepts, such as "Weight loss". Deep learning models have shown high performance across various MCN datasets containing small number of target concepts along with adequate number of training examples per concept. However, scaling these models to millions of medical concepts entails the creation of much larger datasets which is cost and effort intensive. Recent works have shown that training MCN models using automatically labeled examples extracted from medical knowledge bases partially alleviates this problem. We extend this idea by computationally creating a distant dataset from patient discussion forums. We extract informal medical phrases and medical concepts from these forums using a synthetically trained classifier and an off-the-shelf medical entity linker respectively. We use pretrained sentence encoding models to find the k-nearest phrases corresponding to each medical concept. These mappings are used in combination with the examples obtained from medical knowledge bases to train an MCN model. Our approach outperforms the previous state-of-the-art by 15.9% and 17.1% classification accuracy across two datasets while avoiding manual labeling.
Collapse
|
37
|
DTiGEMS+: drug-target interaction prediction using graph embedding, graph mining, and similarity-based techniques. J Cheminform 2020; 12:44. [PMID: 33431036 PMCID: PMC7325230 DOI: 10.1186/s13321-020-00447-2] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 06/16/2020] [Indexed: 12/14/2022] Open
Abstract
In silico prediction of drug–target interactions is a critical phase in the sustainable drug development process, especially when the research focus is to capitalize on the repositioning of existing drugs. However, developing such computational methods is not an easy task, but is much needed, as current methods that predict potential drug–target interactions suffer from high false-positive rates. Here we introduce DTiGEMS+, a computational method that predicts Drug–Target interactions using Graph Embedding, graph Mining, and Similarity-based techniques. DTiGEMS+ combines similarity-based as well as feature-based approaches, and models the identification of novel drug–target interactions as a link prediction problem in a heterogeneous network. DTiGEMS+ constructs the heterogeneous network by augmenting the known drug–target interactions graph with two other complementary graphs namely: drug–drug similarity, target–target similarity. DTiGEMS+ combines different computational techniques to provide the final drug target prediction, these techniques include graph embeddings, graph mining, and machine learning. DTiGEMS+ integrates multiple drug–drug similarities and target–target similarities into the final heterogeneous graph construction after applying a similarity selection procedure as well as a similarity fusion algorithm. Using four benchmark datasets, we show DTiGEMS+ substantially improves prediction performance compared to other state-of-the-art in silico methods developed to predict of drug-target interactions by achieving the highest average AUPR across all datasets (0.92), which reduces the error rate by 33.3% relative to the second-best performing model in the state-of-the-art methods comparison.
Collapse
|
38
|
DLPNet: A deep manifold network for feature extraction of hyperspectral imagery. Neural Netw 2020; 129:7-18. [PMID: 32485560 DOI: 10.1016/j.neunet.2020.05.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 05/14/2020] [Accepted: 05/15/2020] [Indexed: 11/18/2022]
Abstract
Deep learning has received increasing attention in recent years and it has been successfully applied for feature extraction (FE) of hyperspectral images. However, most deep learning methods fail to explore the manifold structure in hyperspectral image (HSI). To tackle this issue, a novel graph-based deep learning model, termed deep locality preserving neural network (DLPNet), was proposed in this paper. Traditional deep learning methods use random initialization to initialize network parameters. Different from that, DLPNet initializes each layer of the network by exploring the manifold structure in hyperspectral data. In the stage of network optimization, it designed a deep-manifold learning joint loss function to exploit graph embedding process while measuring the difference between the predictive value and the actual value, then the proposed model can take into account the extraction of deep features and explore the manifold structure of data simultaneously. Experimental results on real-world HSI datasets indicate that the proposed DLPNet performs significantly better than some state-of-the-art methods.
Collapse
|
39
|
Hyperlink regression via Bregman divergence. Neural Netw 2020; 126:362-383. [PMID: 32294616 DOI: 10.1016/j.neunet.2020.03.026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Revised: 03/17/2020] [Accepted: 03/27/2020] [Indexed: 10/24/2022]
Abstract
A collection of U(∈N) data vectors is called a U-tuple, and the association strength among the vectors of a tuple is termed as the hyperlink weight, that is assumed to be symmetric with respect to permutation of the entries in the index. We herein propose Bregman hyperlink regression (BHLR), which learns a user-specified symmetric similarity function such that it predicts the tuple's hyperlink weight from data vectors stored in the U-tuple. BHLR is a simple and general framework for hyper-relational learning, that minimizes Bregman-divergence (BD) between the hyperlink weights and estimated similarities defined for the corresponding tuples; BHLR encompasses various existing methods, such as logistic regression (U=1), Poisson regression (U=1), link prediction (U=2), and those for representation learning, such as graph embedding (U=2), matrix factorization (U=2), tensor factorization (U≥2), and their variants equipped with arbitrary BD. Nonlinear functions (e.g., neural networks), can be employed for the similarity functions. However, there are theoretical challenges such that some of different tuples of BHLR may share data vectors therein, unlike the i.i.d. setting of classical regression. We address these theoretical issues, and proved that BHLR equipped with arbitrary BD and U∈N is (P-1) statistically consistent, that is, it asymptotically recovers the underlying true conditional expectation of hyperlink weights given data vectors, and (P-2) computationally tractable, that is, it is efficiently computed by stochastic optimization algorithms using a novel generalized minibatch sampling procedure for hyper-relational data. Consequently, theoretical guarantees for BHLR including several existing methods, that have been examined experimentally, are provided in a unified manner.
Collapse
|
40
|
ADHD classification by dual subspace learning using resting-state functional connectivity. Artif Intell Med 2020; 103:101786. [PMID: 32143793 DOI: 10.1016/j.artmed.2019.101786] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 12/11/2019] [Accepted: 12/30/2019] [Indexed: 11/19/2022]
Abstract
As one of the most common neurobehavioral diseases in school-age children, Attention Deficit Hyperactivity Disorder (ADHD) has been increasingly studied in recent years. But it is still a challenge problem to accurately identify ADHD patients from healthy persons. To address this issue, we propose a dual subspace classification algorithm by using individual resting-state Functional Connectivity (FC). In detail, two subspaces respectively containing ADHD and healthy control features, called as dual subspaces, are learned with several subspace measures, wherein a modified graph embedding measure is employed to enhance the intra-class relationship of these features. Therefore, given a subject (used as test data) with its FCs, the basic classification principle is to compare its projected component energy of FCs on each subspace and then predict the ADHD or control label according to the subspace with larger energy. However, this principle in practice works with low efficiency, since the dual subspaces are unstably obtained from ADHD databases of small size. Thereby, we present an ADHD classification framework by a binary hypothesis testing of test data. Here, the FCs of test data with its ADHD or control label hypothesis are employed in the discriminative FC selection of training data to promote the stability of dual subspaces. For each hypothesis, the dual subspaces are learned from the selected FCs of training data. The total projected energy of these FCs is also calculated on the subspaces. Sequentially, the energy comparison is carried out under the binary hypotheses. The ADHD or control label is finally predicted for test data with the hypothesis of larger total energy. In the experiments on ADHD-200 dataset, our method achieves a significant classification performance compared with several state-of-the-art machine learning and deep learning methods, where our accuracy is about 90 % for most of ADHD databases in the leave-one-out cross-validation test.
Collapse
|
41
|
Guided graph spectral embedding: Application to the C. elegans connectome. Netw Neurosci 2019; 3:807-826. [PMID: 31410381 PMCID: PMC6663470 DOI: 10.1162/netn_a_00084] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 03/12/2019] [Indexed: 11/17/2022] Open
Abstract
Graph spectral analysis can yield meaningful embeddings of graphs by providing insight into distributed features not directly accessible in nodal domain. Recent efforts in graph signal processing have proposed new decompositions—for example, based on wavelets and Slepians—that can be applied to filter signals defined on the graph. In this work, we take inspiration from these constructions to define a new guided spectral embedding that combines maximizing energy concentration with minimizing modified embedded distance for a given importance weighting of the nodes. We show that these optimization goals are intrinsically opposite, leading to a well-defined and stable spectral decomposition. The importance weighting allows us to put the focus on particular nodes and tune the trade-off between global and local effects. Following the derivation of our new optimization criterion, we exemplify the methodology on the C. elegans structural connectome. The results of our analyses confirm known observations on the nematode’s neural network in terms of functionality and importance of cells. Compared with Laplacian embedding, the guided approach, focused on a certain class of cells (sensory neurons, interneurons, or motoneurons), provides more biological insights, such as the distinction between somatic positions of cells, and their involvement in low- or high-order processing functions.
Collapse
|
42
|
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery. BMC Bioinformatics 2019; 20:306. [PMID: 31238875 PMCID: PMC6593489 DOI: 10.1186/s12859-019-2914-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 05/24/2019] [Indexed: 11/23/2022] Open
Abstract
Background Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. Results In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, edge2vec significantly outperforms state-of-the-art models on all three tasks. Conclusions We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.
Collapse
|
43
|
How language flows when movements don't: An automated analysis of spontaneous discourse in Parkinson's disease. BRAIN AND LANGUAGE 2016; 162:19-28. [PMID: 27501386 DOI: 10.1016/j.bandl.2016.07.008] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2016] [Revised: 04/20/2016] [Accepted: 07/25/2016] [Indexed: 06/06/2023]
Abstract
To assess the impact of Parkinson's disease (PD) on spontaneous discourse, we conducted computerized analyses of brief monologues produced by 51 patients and 50 controls. We explored differences in semantic fields (via latent semantic analysis), grammatical choices (using part-of-speech tagging), and word-level repetitions (with graph embedding tools). Although overall output was quantitatively similar between groups, patients relied less heavily on action-related concepts and used more subordinate structures. Also, a classification tool operating on grammatical patterns identified monologues as pertaining to patients or controls with 75% accuracy. Finally, while the incidence of dysfluent word repetitions was similar between groups, it allowed inferring the patients' level of motor impairment with 77% accuracy. Our results highlight the relevance of studying naturalistic discourse features to tap the integrity of neural (and, particularly, motor) networks, beyond the possibilities of standard token-level instruments.
Collapse
|
44
|
Identifying group discriminative and age regressive sub-networks from DTI-based connectivity via a unified framework of non-negative matrix factorization and graph embedding. Med Image Anal 2014; 18:1337-48. [PMID: 25037933 PMCID: PMC4205764 DOI: 10.1016/j.media.2014.06.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Revised: 05/29/2014] [Accepted: 06/17/2014] [Indexed: 02/06/2023]
Abstract
Diffusion tensor imaging (DTI) offers rich insights into the physical characteristics of white matter (WM) fiber tracts and their development in the brain, facilitating a network representation of brain's traffic pathways. Such a network representation of brain connectivity has provided a novel means of investigating brain changes arising from pathology, development or aging. The high dimensionality of these connectivity networks necessitates the development of methods that identify the connectivity building blocks or sub-network components that characterize the underlying variation in the population. In addition, the projection of the subject networks into the basis set provides a low dimensional representation of it, that teases apart different sources of variation in the sample, facilitating variation-specific statistical analysis. We propose a unified framework of non-negative matrix factorization and graph embedding for learning sub-network patterns of connectivity by their projective non-negative decomposition into a reconstructive basis set, as well as, additional basis sets representing variational sources in the population like age and pathology. The proposed framework is applied to a study of diffusion-based connectivity in subjects with autism that shows localized sparse sub-networks which mostly capture the changes related to pathology and developmental variations.
Collapse
|
45
|
A trace ratio maximization approach to multiple kernel-based dimensionality reduction. Neural Netw 2013; 49:96-106. [PMID: 24211342 DOI: 10.1016/j.neunet.2013.09.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2012] [Revised: 09/03/2013] [Accepted: 09/18/2013] [Indexed: 11/16/2022]
Abstract
Most dimensionality reduction techniques are based on one metric or one kernel, hence it is necessary to select an appropriate kernel for kernel-based dimensionality reduction. Multiple kernel learning for dimensionality reduction (MKL-DR) has been recently proposed to learn a kernel from a set of base kernels which are seen as different descriptions of data. As MKL-DR does not involve regularization, it might be ill-posed under some conditions and consequently its applications are hindered. This paper proposes a multiple kernel learning framework for dimensionality reduction based on regularized trace ratio, termed as MKL-TR. Our method aims at learning a transformation into a space of lower dimension and a corresponding kernel from the given base kernels among which some may not be suitable for the given data. The solutions for the proposed framework can be found based on trace ratio maximization. The experimental results demonstrate its effectiveness in benchmark datasets, which include text, image and sound datasets, for supervised, unsupervised as well as semi-supervised settings.
Collapse
|