1
|
Yan D, Fan Z, Li Q, Chen Y. PPIA-coExp: Discovering Context-Specific Biomarkers Based on Protein-Protein Interactions, Co-Expression Networks, and Expression Data. Int J Mol Sci 2024; 25:12608. [PMID: 39684321 DOI: 10.3390/ijms252312608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 11/19/2024] [Accepted: 11/21/2024] [Indexed: 12/18/2024] Open
Abstract
Identifying a small set of effective biomarkers from multi-omics data is important for the discrimination of different cell types and helpful for the early detection diagnosis of complex diseases. However, it is challenging to identify optimal biomarkers from the high throughput molecular data. Here, we present a method called protein-protein interaction affinity and co-expression network (PPIA-coExp), a linear programming model designed to discover context-specific biomarkers based on co-expressed networks and protein-protein interaction affinity (PPIA), which was used to estimate the concentrations of protein complexes based on the law of mass action. The performance of PPIA-coExp excelled over the traditional node-based approaches in both the small and large samples. We applied PPIA-coExp to human aging and Alzheimer's disease (AD) and discovered some important biomarkers. In addition, we performed the integrative analysis of transcriptome and epigenomic data, revealing the correlation between the changes in gene expression and different histone modification distributions in human aging and AD.
Collapse
Affiliation(s)
- Dongsheng Yan
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Zhiyu Fan
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Qianzhong Li
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Inner Mongolia University, Hohhot 010021, China
| | - Yingli Chen
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Inner Mongolia University, Hohhot 010021, China
| |
Collapse
|
2
|
Ai Y, Xie X, Ma X. Graph Contrastive Learning for Tracking Dynamic Communities in Temporal Networks. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2024; 8:3422-3435. [DOI: 10.1109/tetci.2024.3386844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Affiliation(s)
- Yun Ai
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xianghua Xie
- Department of Computer Science, Swansea University, Swansea, U.K
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
3
|
Gao X, Wang Y, Hou W, Liu Z, Ma X. Multi-View Clustering for Integration of Gene Expression and Methylation Data With Tensor Decomposition and Self-Representation Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2050-2063. [PMID: 37015414 DOI: 10.1109/tcbb.2022.3229678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The accumulated DNA methylation and gene expression provide a great opportunity to exploit the epigenetic patterns of genes, which is the foundation for revealing the underlying mechanisms of biological systems. Current integrative algorithms are criticized for undesirable performance because they fail to address the heterogeneity of expression and methylation data, and the intrinsic relations among them. To solve this issue, a novel multi-view clustering with self-representation learning and low-rank tensor constraint (MCSL-LTC) is proposed for the integration of gene expression and DNA methylation data, which are treated as complementary views. Specifically, MCSL-LTC first learns the low-dimensional features for each view with the linear projection, and then these features are fused in a unified tensor space with low-rank constraints. In this case, the complementary information of various views is precisely captured, where the heterogeneity of omic data is avoided, thereby enhancing the consistency of different views. Finally, MCSL-LTC obtains a consensus cluster of genes reflecting the structure and features of various views. Experimental results demonstrate that the proposed approach outperforms state-of-the-art baselines in terms of accuracy on both the social and cancer data, which provides an effective and efficient method for the integration of heterogeneous genomic data.
Collapse
|
4
|
Li D, Ma X, Gong M. Joint Learning of Feature Extraction and Clustering for Large-Scale Temporal Networks. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1653-1666. [PMID: 34495863 DOI: 10.1109/tcyb.2021.3107679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Temporal networks are ubiquitous in nature and society, and tracking the dynamics of networks is fundamental for investigating the mechanisms of systems. Dynamic communities in temporal networks simultaneously reflect the topology of the current snapshot (clustering accuracy) and historical ones (clustering drift). Current algorithms are criticized for their inability to characterize the dynamics of networks at the vertex level, independence of feature extraction and clustering, and high time complexity. In this study, we solve these problems by proposing a novel joint learning model for dynamic community detection in temporal networks (also known as jLMDC) via joining feature extraction and clustering. This model is formulated as a constrained optimization problem. Vertices are classified into dynamic and static groups by exploring the topological structure of temporal networks to fully exploit their dynamics at each time step. Then, jLMDC updates the features of dynamic vertices by preserving features of static ones during optimization. The advantage of jLMDC is that features are extracted under the guidance of clustering, promoting performance, and saving the running time of the algorithm. Finally, we extend jLMDC to detect the overlapping dynamic community in temporal networks. The experimental results on 11 temporal networks demonstrate that jLMDC improves accuracy up to 8.23% and saves 24.89% of running time on average compared to state-of-the-art methods.
Collapse
|
5
|
Wu W, Ma X. Network-Based Structural Learning Nonnegative Matrix Factorization Algorithm for Clustering of scRNA-Seq Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:566-575. [PMID: 35316190 DOI: 10.1109/tcbb.2022.3161131] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) measures expression profiles at the single-cell level, which sheds light on revealing the heterogeneity and functional diversity among cell populations. The vast majority of current algorithms identify cell types by directly clustering transcriptional profiles, which ignore indirect relations among cells, resulting in an undesirable performance on cell type discovery and trajectory inference. Therefore, there is a critical need for inferring cell types and trajectories by exploiting the interactions among cells. In this study, we propose a network-based structural learning nonnegative matrix factorization algorithm (aka SLNMF) for the identification of cell types in scRNA-seq, which is transformed into a constrained optimization problem. SLNMF first constructs the similarity network for cells and then extracts latent features of the cells by exploiting the topological structure of the cell-cell network. To improve the clustering performance, the structural constraint is imposed on the model to learn the latent features of cells by preserving the structural information of the networks, thereby significantly improving the performance of algorithms. Finally, we track the trajectory of cells by exploring the relationships among cell types. Fourteen scRNA-seq datasets are adopted to validate the performance of algorithms with the number of single cells varying from 49 to 26,484. The experimental results demonstrate that SLNMF significantly outperforms fifteen state-of-the-art methods with 15.32% improvement in terms of accuracy, and it accurately identifies the trajectories of cells. The proposed model and methods provide an effective strategy to analyze scRNA-seq data. (The software is coded using matlab, and is freely available for academic https://github.com/xkmaxidian/SLNMF).
Collapse
|
6
|
Manipur I, Giordano M, Piccirillo M, Parashuraman S, Maddalena L. Community Detection in Protein-Protein Interaction Networks and Applications. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:217-237. [PMID: 34951849 DOI: 10.1109/tcbb.2021.3138142] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The ability to identify and characterize not only the protein-protein interactions but also their internal modular organization through network analysis is fundamental for understanding the mechanisms of biological processes at the molecular level. Indeed, the detection of the network communities can enhance our understanding of the molecular basis of disease pathology, and promote drug discovery and disease treatment in personalized medicine. This work gives an overview of recent computational methods for the detection of protein complexes and functional modules in protein-protein interaction networks, also providing a focus on some of its applications. We propose a systematic reformulation of frequently adopted taxonomies for these methods, also proposing new categories to keep up with the most recent research. We review the literature of the last five years (2017-2021) and provide links to existing data and software resources. Finally, we survey recent works exploiting module identification and analysis, in the context of a variety of disease processes for biomarker identification and therapeutic target detection. Our review provides the interested reader with an up-to-date and self-contained view of the existing research, with links to state-of-the-art literature and resources, as well as hints on open issues and future research directions in complex detection and its applications.
Collapse
|
7
|
Chen J, Han G, Xu A, Akutsu T, Cai H. Identifying miRNA-Gene Common and Specific Regulatory Modules for Cancer Subtyping by a High-Order Graph Matching Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:421-431. [PMID: 35320104 DOI: 10.1109/tcbb.2022.3161635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Identifying regulatory modules between miRNAs and genes is crucial in cancer research. It promotes a comprehensive understanding of the molecular mechanisms of cancer. The genomic data collected from subjects usually relate to different cancer statuses, such as different TNM Classifications of Malignant Tumors (TNM) or histological subtypes. Simple integrated analyses generally identify the core of the tumorigenesis (common modules) but miss the subtype-specific regulatory mechanisms (specific modules). In contrast, separate analyses can only report the differences and ignore important common modules. Therefore, there is an urgent need to develop a novel method to jointly analyze miRNA and gene data of different cancer statuses to identify common and specific modules. To that end, we developed a High-Order Graph Matching model to identify Common and Specific modules (HOGMCS) between miRNA and gene data of different cancer statuses. We first demonstrate the superiority of HOGMCS through a comparison with four state-of-the-art techniques using a set of simulated data. Then, we apply HOGMCS on stomach adenocarcinoma data with four TNM stages and two histological types, and breast invasive carcinoma data with four PAM50 subtypes. The experimental results demonstrate that HOGMCS can accurately extract common and subtype-specific miRNA-gene regulatory modules, where many identified miRNA-gene interactions have been confirmed in several public databases.
Collapse
|
8
|
Huang Z, Wang Y, Ma X. Clustering of Cancer Attributed Networks by Dynamically and Jointly Factorizing Multi-Layer Graphs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2737-2748. [PMID: 34143738 DOI: 10.1109/tcbb.2021.3090586] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The accumulated omic data provides an opportunity to exploit the mechanisms of cancers and poses a challenge for their integrative analysis. Although extensive efforts have been devoted to address this issue, the current algorithms result in undesirable performance because of the complexity of patterns and heterogeneity of data. In this study, the ultimate goal is to propose an effective and efficient algorithm (called NMF-DEC) to identify clusters by integrating the interactome and transcriptome data. By treating the expression profiles of genes as attributes of vertices in the gene interaction networks, we transform the integrative analysis of omic data into clustering of attributed networks. To circumvent the heterogeneity, we construct a similarity network for the attributes of genes and cast it into the common module detection problem in multi-layer networks. The NMF-DEC explores the relation between attributes and topological structure of networks by jointly factorizing the similarity and interaction networks with the same basis. In this optimization, the interaction network is dynamically updated and the information of attributes is dynamically incorporated, providing a better strategy to characterize the structure of modules in attributed networks. Extensive experiments indicate that compared with state-of-the-art baselines, NMF-DEC is more accurate on social network, and show better performance on cancer attributed networks, implying the superiority of the proposed methods for the integrative analysis of omic data.
Collapse
|
9
|
Chen J, Huang J, Liao Y, Zhu L, Cai H. Identify Multiple Gene-Drug Common Modules Via Constrained Graph Matching. IEEE J Biomed Health Inform 2022; 26:4794-4805. [PMID: 35788454 DOI: 10.1109/jbhi.2022.3188503] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Identifying gene-drug interactions is vital to understanding biological mechanisms and achieving precise drug repurposing. High-throughput technologies produce a large amount of pharmacological and genomic data, providing an opportunity to explore the associations between oncogenic genes and therapeutic drugs. However, most studies only focus on "one-to-one" or "one-to-many" interactions, ignoring the multivariate patterns between genes and drugs. In this article, a high-order graph matching model with hypergraph constraints is proposed to discover the gene-drug common regulatory modules. Moreover, the prior knowledge is formulated into hypergraph constraints to reveal their multiple correspondences, penalizing the tensor matching process. The experimental results on the synthetic data demonstrate the proposed model is robust to noise contamination and outlier corruption, achieving a better performance than four state-of-the-art methods. We then evaluate the statistical power of our proposed method on the pharmacogenomics data. Our identified gene-drug common modules not only show significantly enriched pathways associated with cancer but also manifest the highly close gene-drug interactions.
Collapse
|
10
|
Maity AK, Stone TC, Ward V, Webster AP, Yang Z, Hogan A, McBain H, Duku M, Ho KMA, Wolfson P, Graham DG, Beck S, Teschendorff AE, Lovat LB. Novel epigenetic network biomarkers for early detection of esophageal cancer. Clin Epigenetics 2022; 14:23. [PMID: 35164838 PMCID: PMC8845366 DOI: 10.1186/s13148-022-01243-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 02/04/2022] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Early detection of esophageal cancer is critical to improve survival. Whilst studies have identified biomarkers, their interpretation and validity is often confounded by cell-type heterogeneity. RESULTS Here we applied systems-epigenomic and cell-type deconvolution algorithms to a discovery set encompassing RNA-Seq and DNA methylation data from esophageal adenocarcinoma (EAC) patients and matched normal-adjacent tissue, in order to identify robust biomarkers, free from the confounding effect posed by cell-type heterogeneity. We identify 12 gene-modules that are epigenetically deregulated in EAC, and are able to validate all 12 modules in 4 independent EAC cohorts. We demonstrate that the epigenetic deregulation is present in the epithelial compartment of EAC-tissue. Using single-cell RNA-Seq data we show that one of these modules, a proto-cadherin module centered around CTNND2, is inactivated in Barrett's Esophagus, a precursor lesion to EAC. By measuring DNA methylation in saliva from EAC cases and controls, we identify a chemokine module centered around CCL20, whose methylation patterns in saliva correlate with EAC status. CONCLUSIONS Given our observations that a CCL20 chemokine network is overactivated in EAC tissue and saliva from EAC patients, and that in independent studies CCL20 has been found to be overactivated in EAC tissue infected with the bacterium F. nucleatum, a bacterium that normally inhabits the oral cavity, our results highlight the possibility of using DNAm measurements in saliva as a proxy for changes occurring in the esophageal epithelium. Both the CTNND2/CCL20 modules represent novel promising network biomarkers for EAC that merit further investigation.
Collapse
Affiliation(s)
- Alok K Maity
- CAS Key Lab of Computational Biology, Shanghai Institute for Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Timothy C Stone
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Vanessa Ward
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Amy P Webster
- University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Zhen Yang
- Key Laboratory of Medical Epigenetics and Metabolism, Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, China
| | - Aine Hogan
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Hazel McBain
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Margaraet Duku
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Kai Man Alexander Ho
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Paul Wolfson
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - David G Graham
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK.,Division of GI Services, University College London Hospitals NHS Foundation Trust, 235 Euston Road, London, NW1 2BU, UK
| | | | - Stephan Beck
- UCL Cancer Institute, University College London, Gower Street, London, WC1E 6BT, UK
| | - Andrew E Teschendorff
- CAS Key Lab of Computational Biology, Shanghai Institute for Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China.
| | - Laurence B Lovat
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK. .,Division of GI Services, University College London Hospitals NHS Foundation Trust, 235 Euston Road, London, NW1 2BU, UK.
| |
Collapse
|
11
|
Ma X, Sun P, Gong M. An Integrative Framework of Heterogeneous Genomic Data for Cancer Dynamic Modules Based on Matrix Decomposition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:305-316. [PMID: 32750874 DOI: 10.1109/tcbb.2020.3004808] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Cancer progression is dynamic, and tracking dynamic modules is promising for cancer diagnosis and therapy. Accumulated genomic data provide us an opportunity to investigate the underlying mechanisms of cancers. However, as far as we know, no algorithm has been designed for dynamic modules by integrating heterogeneous omics data. To address this issue, we propose an integrative framework for dynamic module detection based on regularized nonnegative matrix factorization method (DrNMF) by integrating the gene expression and protein interaction network. To remove the heterogeneity of genomic data, we divide the samples of expression profiles into groups to construct gene co-expression networks. To characterize the dynamics of modules, the temporal smoothness framework is adopted, in which the gene co-expression network at the previous stage and protein interaction network are incorporated into the objective function of DrNMF via regularization. The experimental results demonstrate that DrNMF is superior to state-of-the-art methods in terms of accuracy. For breast cancer data, the obtained dynamic modules are more enriched by the known pathways, and can be used to predict the stages of cancers and survival time of patients. The proposed model and algorithm provide an effective integrative analysis of heterogeneous genomic data for cancer progression.
Collapse
|
12
|
Zhang B, Gong M, Huang J, Ma X. Clustering Heterogeneous Information Network by Joint Graph Embedding and Nonnegative Matrix Factorization. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA 2021; 15:1-25. [DOI: 10.1145/3441449] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 12/01/2020] [Indexed: 02/02/2023]
Abstract
Many complex systems derived from nature and society consist of multiple types of entities and heterogeneous interactions, which can be effectively modeled as heterogeneous information network (HIN). Structural analysis of heterogeneous networks is of great significance by leveraging the rich semantic information of objects and links in the heterogeneous networks. And, clustering heterogeneous networks aims to group vertices into classes, which sheds light on revealing the structure–function relations of the underlying systems. The current algorithms independently perform the feature extraction and clustering, which are criticized for not fully characterizing the structure of clusters. In this study, we propose a learning model by joint <underline>G</underline>raph <underline>E</underline>mbedding and <underline>N</underline>onnegative <underline>M</underline>atrix <underline>F</underline>actorization (aka
GEjNMF
), where feature extraction and clustering are simultaneously learned by exploiting the graph embedding and latent structure of networks. We formulate the objective function of GEjNMF and transform the heterogeneous network clustering problem into a constrained optimization problem, which is effectively solved by
l
0
-norm optimization. The advantage of GEjNMF is that features are selected under the guidance of clustering, which improves the performance and saves the running time of algorithms at the same time. The experimental results on three benchmark heterogeneous networks demonstrate that GEjNMF achieves the best performance with the least running time compared with the best state-of-the-art methods. Furthermore, the proposed algorithm is robust across heterogeneous networks from various fields. The proposed model and method provide an effective alternative for heterogeneous network clustering.
Collapse
|
13
|
Wang Y, Xia Z, Deng J, Xie X, Gong M, Ma X. TLGP: a flexible transfer learning algorithm for gene prioritization based on heterogeneous source domain. BMC Bioinformatics 2021; 22:274. [PMID: 34433414 PMCID: PMC8386056 DOI: 10.1186/s12859-021-04190-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 05/12/2021] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Gene prioritization (gene ranking) aims to obtain the centrality of genes, which is critical for cancer diagnosis and therapy since keys genes correspond to the biomarkers or targets of drugs. Great efforts have been devoted to the gene ranking problem by exploring the similarity between candidate and known disease-causing genes. However, when the number of disease-causing genes is limited, they are not applicable largely due to the low accuracy. Actually, the number of disease-causing genes for cancers, particularly for these rare cancers, are really limited. Therefore, there is a critical needed to design effective and efficient algorithms for gene ranking with limited prior disease-causing genes. RESULTS In this study, we propose a transfer learning based algorithm for gene prioritization (called TLGP) in the cancer (target domain) without disease-causing genes by transferring knowledge from other cancers (source domain). The underlying assumption is that knowledge shared by similar cancers improves the accuracy of gene prioritization. Specifically, TLGP first quantifies the similarity between the target and source domain by calculating the affinity matrix for genes. Then, TLGP automatically learns a fusion network for the target cancer by fusing affinity matrix, pathogenic genes and genomic data of source cancers. Finally, genes in the target cancer are prioritized. The experimental results indicate that the learnt fusion network is more reliable than gene co-expression network, implying that transferring knowledge from other cancers improves the accuracy of network construction. Moreover, TLGP outperforms state-of-the-art approaches in terms of accuracy, improving at least 5%. CONCLUSION The proposed model and method provide an effective and efficient strategy for gene ranking by integrating genomic data from various cancers.
Collapse
Affiliation(s)
- Yan Wang
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
- Department of Library, Xidian University, South TaiBai Road, Xi’an, China
| | - Zuheng Xia
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
| | - Jingjing Deng
- Department of Computer Science, Swansea University, Bay, UK
| | - Xianghua Xie
- Department of Computer Science, Swansea University, Bay, UK
| | - Maoguo Gong
- School of Electronic Engineering, Xidian University, South TaiBai Road, Xi’an, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
| |
Collapse
|
14
|
Lin Y, Ma X. Predicting lincRNA-Disease Association in Heterogeneous Networks Using Co-regularized Non-negative Matrix Factorization. Front Genet 2021; 11:622234. [PMID: 33510774 PMCID: PMC7835800 DOI: 10.3389/fgene.2020.622234] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 12/03/2020] [Indexed: 02/02/2023] Open
Abstract
Long intergenic non-coding ribonucleic acids (lincRNAs) are critical regulators for many complex diseases, and identification of disease-lincRNA association is both costly and time-consuming. Therefore, it is necessary to design computational approaches to predict the disease-lincRNA associations that shed light on the mechanisms of diseases. In this study, we develop a co-regularized non-negative matrix factorization (aka Cr-NMF) to identify potential disease-lincRNA associations by integrating the gene expression of lincRNAs, genetic interaction network for mRNA genes, gene-lincRNA associations, and disease-gene associations. The Cr-NMF algorithm factorizes the disease-lincRNA associations, while the other associations/interactions are integrated using regularization. Furthermore, the regularization does not only preserve the topological structure of the lincRNA co-expression network, but also maintains the links "lincRNA → gene → disease." Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art methods in terms of accuracy on predicting the disease-lincRNA associations. The model and algorithm provide an effective way to explore disease-lncRNA associations.
Collapse
Affiliation(s)
- Yong Lin
- School of Physics and Electronic Information Engineering, Ningxia Normal University, Guyuan, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
15
|
Liu H, Guan J, Li H, Bao Z, Wang Q, Luo X, Xue H. Predicting the Disease Genes of Multiple Sclerosis Based on Network Representation Learning. Front Genet 2020; 11:328. [PMID: 32373160 PMCID: PMC7186413 DOI: 10.3389/fgene.2020.00328] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 03/19/2020] [Indexed: 02/02/2023] Open
Abstract
Multiple sclerosis (MS) is an autoimmune disease for which it is difficult to find exact disease-related genes. Effectively identifying disease-related genes would contribute to improving the treatment and diagnosis of multiple sclerosis. Current methods for identifying disease-related genes mainly focus on the hypothesis of guilt-by-association and pay little attention to the global topological information of the whole protein-protein-interaction (PPI) network. Besides, network representation learning (NRL) has attracted a huge amount of attention in the area of network analysis because of its promising performance in node representation and many downstream tasks. In this paper, we try to introduce NRL into the task of disease-related gene prediction and propose a novel framework for identifying the disease-related genes multiple sclerosis. The proposed framework contains three main steps: capturing the topological structure of the PPI network using NRL-based methods, encoding learned features into low-dimensional space using a stacked autoencoder, and training a support vector machine (SVM) classifier to predict disease-related genes. Compared with three state-of-the-art algorithms, our proposed framework shows superior performance on the task of predicting disease-related genes of multiple sclerosis.
Collapse
Affiliation(s)
- Haijie Liu
- Department of Neurology, Xuanwu Hospital, Capital Medical University, Beijing, China
- Department of Physical Medicine and Rehabilitation, Tianjin Medical University General Hospital, Tianjin, China
- Stroke Biological Recovery Laboratory, Department of Physical Medicine and Rehabilitation, Spaulding Rehabilitation Hospital, The Teaching Affiliate of Harvard Medical School Charlestown, Boston, MA, United States
| | - Jiaojiao Guan
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - He Li
- Department of Automation, College of Information Science and Engineering, Tianjin Tianshi College, Tianjin, China
| | - Zhijie Bao
- School of Textile Science and Engineering, Tiangong University, Tianjin, China
| | - Qingmei Wang
- Stroke Biological Recovery Laboratory, Department of Physical Medicine and Rehabilitation, Spaulding Rehabilitation Hospital, The Teaching Affiliate of Harvard Medical School Charlestown, Boston, MA, United States
| | - Xun Luo
- Kerry Rehabilitation Medicine Research Institute, Shenzhen, China
- Shenzhen Dapeng New District Nan'ao People's Hospital, Shenzhen, China
| | - Hansheng Xue
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
16
|
Aouiche C, Chen B, Shang X. Predicting Stage-Specific Recurrent Aberrations From Somatic Copy Number Dataset. Front Genet 2020; 11:160. [PMID: 32174978 PMCID: PMC7054343 DOI: 10.3389/fgene.2020.00160] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 02/11/2020] [Indexed: 02/02/2023] Open
Abstract
Exploring the evolution process of cancers and its related complex molecular mechanisms at the genomic level through pathological staging angle is particularly important for providing novel therapeutic strategies most relevant to every cancer patient diagnosed at each stage. This is because the genomic level involving copy number variation (CNV) has been recognized as a critical genetic variation, which has a large influence on the progression of a variety of complex diseases. Great efforts have been devoted to the identification of recurrent aberrations, single genes and individual static pathways related to cancer progression. However, we still have little knowledge about the most important aberrant genes related to the pathology stages and their interconnected pathways from genomic profiles. In this study, we propose an identification framework that allows determining cancer-stages specific patterns dynamically. Firstly, a two-stage GAIA method is employed to identify stage-specific aberrant copy number variants segments. Secondly, stage-specific cancer genes fully located within the aberrant segments are then identified according to the reference annotation dataset. Thirdly, a pathway evolution network is constructed based on the impacted pathways functions and their overlapped genes. The involved significant functions and evolution paths uncovered by this network enabled investigation of the real progression of cancers, and thus facilitated the determination of appropriate clinical settings that will help to assess risk in cancer patients. Those findings at individual levels can be integrated to identify robust biomarkers in cancer progressions.
Collapse
Affiliation(s)
- Chaima Aouiche
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Xi'an, China
- Centre for Multidisciplinary Convergence Computing, School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
17
|
Lin Q, Lin Y, Yu Q, Ma X. Clustering of Cancer Attributed Networks via Integration of Graph Embedding and Matrix Factorization. IEEE ACCESS 2020; 8:197463-197472. [DOI: 10.1109/access.2020.3034623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
18
|
Sun S, Lee YR, Enfield B. Hemimethylation Patterns in Breast Cancer Cell Lines. Cancer Inform 2019; 18:1176935119872959. [PMID: 31496635 PMCID: PMC6716185 DOI: 10.1177/1176935119872959] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 08/05/2019] [Indexed: 02/01/2023] Open
Abstract
DNA methylation is an epigenetic event that involves adding a methyl group to the cytosine (C) site, especially the one that pairs with a guanine (G) site (ie, CG or CpG site), in a human genome. This event plays an important role in both cancerous and normal cell development. Previous studies often assume symmetric methylation on both DNA strands. However, asymmetric methylation, or hemimethylation (methylation that occurs only on 1 DNA strand), does exist and has been reported in several studies. Due to the limitation of previous DNA methylation sequencing technologies, researchers could only study hemimethylation on specific genes, but the overall genomic hemimethylation landscape remains relatively unexplored. With the development of advanced next-generation sequencing techniques, it is now possible to measure methylation levels on both forward and reverse strands at all CpG sites in an entire genome. Analyzing hemimethylation patterns may potentially reveal regions related to undergoing tumor growth. For our research, we first identify hemimethylated CpG sites in breast cancer cell lines using Wilcoxon signed rank tests. We then identify hemimethylation patterns by grouping consecutive hemimethylated CpG sites based on their methylation states, methylation "M" or unmethylation "U." These patterns include regular (or consecutive) hemimethylation clusters (eg, "MMM" on one strand and "UUU" on another strand) and polarity (or reverse) clusters (eg, "MU" on one strand and "UM" on another strand). Our results reveal that most hemimethylation clusters are the polarity type, and hemimethylation does occur across the entire genome with notably higher numbers in the breast cancer cell lines. The lengths or sizes of most hemimethylation clusters are very short, often less than 50 base pairs. After mapping hemimethylation clusters and sites to corresponding genes, we study the functions of these genes and find that several of the highly hemimethylated genes may influence tumor growth or suppression. These genes may also indicate a progressing transition to a new tumor stage.
Collapse
Affiliation(s)
- Shuying Sun
- Department of Mathematics, Texas State University, San Marcos, TX, USA
| | - Yu Ri Lee
- Department of Mathematics, Texas State University, San Marcos, TX, USA
| | - Brittany Enfield
- Global Engineering Systems, Cypress Semiconductor, Austin, TX, USA
| |
Collapse
|
19
|
Abstract
Background DNA methylation is an epigenetic event that may regulate gene expression. Because of this regulation role, aberrant DNA methylation is often associated with many diseases. Within-sample DNA co-methylation is the similarity of methylation in nearby cytosine sites of a chromosome. It is important to study co-methylation patterns. However, it is not well studied yet, and it is unclear to us what co-methylation patterns normal DNA samples have. Are the co-methylation patterns of the same tissue across several samples different? Are the co-methylation patterns of various tissues of the same sample different? To answer these questions, we conduct analyses using two sets of data: 3-sample-1-tissue (3S1T) and 1-sample-8-tissue (1S8T). Results To study the co-methylation patterns of the two datasets, 3S1T and 1S8T, we investigate the following questions: How often does one methylation state change to other methylation states and how is this change associated with chromosome distance? Based on the 3S1T data, we find there is not significant co-methylation difference among the same spleen tissues of three different samples. However, the analysis results of 1S8T data show that there were significant differences among eight tissues of one sample. For both 3S1T and 1S8T data, we find that the no/low methylation state A and high/full methylation state D tend to remain the same along a chromosome region. We also find that the low/partial methylation state B and partial/high methylation state C tend to change to higher methylation states along a chromosome. Finally, we find that lengths of most co-methylation regions are very short with only a few hundred base pairs. In fact, only a small proportion of methylated regions are longer than 1000 base pairs. Conclusions In this paper, we have addressed a few questions regarding within-sample co-methylation patterns in normal tissues. Our statistical analysis results and answers may help researchers to better understand the biological process of DNA methylation. This may pave the way to develop better analysis methods for future methylation research. Electronic supplementary material The online version of this article (10.1186/s13040-019-0198-8) contains supplementary material, which is available to authorized users.
Collapse
|