1
|
Yu Z, Su Y, Lu Y, Yang Y, Wang F, Zhang S, Chang Y, Wong KC, Li X. Topological identification and interpretation for single-cell gene regulation elucidation across multiple platforms using scMGCA. Nat Commun 2023; 14:400. [PMID: 36697410 PMCID: PMC9877026 DOI: 10.1038/s41467-023-36134-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 01/16/2023] [Indexed: 01/26/2023] Open
Abstract
Single-cell RNA sequencing provides high-throughput gene expression information to explore cellular heterogeneity at the individual cell level. A major challenge in characterizing high-throughput gene expression data arises from challenges related to dimensionality, and the prevalence of dropout events. To address these concerns, we develop a deep graph learning method, scMGCA, for single-cell data analysis. scMGCA is based on a graph-embedding autoencoder that simultaneously learns cell-cell topology representation and cluster assignments. We show that scMGCA is accurate and effective for cell segregation and batch effect correction, outperforming other state-of-the-art models across multiple platforms. In addition, we perform genomic interpretation on the key compressed transcriptomic space of the graph-embedding autoencoder to demonstrate the underlying gene regulation mechanism. We demonstrate that in a pancreatic ductal adenocarcinoma dataset, scMGCA successfully provides annotations on the specific cell types and reveals differential gene expression levels across multiple tumor-associated and cell signalling pathways.
Collapse
Affiliation(s)
- Zhuohan Yu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yanchi Su
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yifu Lu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yuning Yang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China
| | - Shixiong Zhang
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China.
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Jilin, China.
| |
Collapse
|
2
|
Wang C, Zhao N, Sun K, Zhang Y. A Cancer Gene Module Mining Method Based on Bio-Network of Multi-Omics Gene Groups. Front Oncol 2020; 10:1159. [PMID: 32637361 PMCID: PMC7317001 DOI: 10.3389/fonc.2020.01159] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 06/08/2020] [Indexed: 11/13/2022] Open
Abstract
The initiation, promotion and progression of cancer are highly associated to the environment a human lives in as well as individual genetic factors. In view of the dangers to life and health caused by this abnormally complex systemic disease, many top scientific research institutions around the world have been actively carrying out research in order to discover the pathogenic mechanisms driving cancer occurrence and development. The emergence of high-throughput sequencing technology has greatly advanced oncology research and given rise to the revelation of important oncogenes and the interrelationship among them. Here, we have studied heterogeneous multi-level data within a context of integrated data, and scientifically introduced lncRNA omics data to construct multi-omics bio-network models, allowing the screening of key cancer-related gene groups. We propose a compactness clustering algorithm based on corrected cumulative rank scores, which uses the functional similarity between groups of genes as a distance measure to excavate key gene modules for abnormal regulation contained in gene groups through clustering. We also conducted a survival analysis using our results and found that our model could divide groups of different levels very well. The results also demonstrate that the integration of multi-omics biological data, key gene modules and their dysregulated gene groups can be discovered, which is crucial for cancer research.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ning Zhao
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Kai Sun
- Thoracic Surgery Department, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| |
Collapse
|
3
|
Paul A, Sil J. Identification of Differentially Expressed Genes to Establish New Biomarker for Cancer Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1970-1985. [PMID: 29994718 DOI: 10.1109/tcbb.2018.2837095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The goal of the human genome project is to integrate genetic information into different clinical therapies. To achieve this goal, different computational algorithms are devised for identifying the biomarker genes, cause of complex diseases. However, most of the methods developed so far using DNA microarray data lack in interpreting biological findings and are less accurate in disease prediction. In the paper, we propose two parameters risk_factor and confusion_factor to identify the biologically significant genes for cancer development. First, we evaluate risk_factor of each gene and the genes with nonzero risk_factor result misclassification of data, therefore removed. Next, we calculate confusion_factor of the remaining genes which determines confusion of a gene in prediction due to closeness of the samples in the cancer and normal classes. We apply nondominated sorting genetic algorithm (NSGA-II) to select the maximally uncorrelated differentially expressed genes in the cancer class with minimum confusion_factor. The proposed Gene Selection Explore (GSE) algorithm is compared to well established feature selection algorithms using 10 microarray data with respect to sensitivity, specificity, and accuracy. The identified genes appear in KEGG pathway and have several biological importance.
Collapse
|
4
|
Wang B, Pourshafeie A, Zitnik M, Zhu J, Bustamante CD, Batzoglou S, Leskovec J. Network enhancement as a general method to denoise weighted biological networks. Nat Commun 2018; 9:3108. [PMID: 30082777 PMCID: PMC6078978 DOI: 10.1038/s41467-018-05469-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 07/03/2018] [Indexed: 12/31/2022] Open
Abstract
Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. However, biological networks are noisy due to the limitations of measurement technology and inherent natural variation, which can hamper discovery of network patterns and dynamics. We propose Network Enhancement (NE), a method for improving the signal-to-noise ratio of undirected, weighted networks. NE uses a doubly stochastic matrix operator that induces sparsity and provides a closed-form solution that increases spectral eigengap of the input network. As a result, NE removes weak edges, enhances real connections, and leads to better downstream performance. Experiments show that NE improves gene-function prediction by denoising tissue-specific interaction networks, alleviates interpretation of noisy Hi-C contact maps from the human genome, and boosts fine-grained identification accuracy of species. Our results indicate that NE is widely applicable for denoising biological networks.
Collapse
Affiliation(s)
- Bo Wang
- Department of Computer Science, Stanford University, 353 Serra Mall, Stanford, 94305, CA, USA
| | - Armin Pourshafeie
- Department of Physics, Stanford University, 382 Via Pueblo Mall, Stanford, 94305, CA, USA
| | - Marinka Zitnik
- Department of Computer Science, Stanford University, 353 Serra Mall, Stanford, 94305, CA, USA
| | - Junjie Zhu
- Department of Electrical Engineering, Stanford University, 350 Serra Mall, Stanford, 94305, CA, USA
| | - Carlos D Bustamante
- Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford, 94305, CA, USA
- Chan Zuckerberg Biohub, 499 Illinois St, San Francisco, 94158, CA, USA
| | - Serafim Batzoglou
- Department of Computer Science, Stanford University, 353 Serra Mall, Stanford, 94305, CA, USA.
- Illumina Inc, 499 Illinois Street, San Francisco, 94158, CA, USA.
| | - Jure Leskovec
- Department of Computer Science, Stanford University, 353 Serra Mall, Stanford, 94305, CA, USA.
- Chan Zuckerberg Biohub, 499 Illinois St, San Francisco, 94158, CA, USA.
| |
Collapse
|
5
|
Validation of a 10-gene molecular signature for predicting biochemical recurrence and clinical metastasis in localized prostate cancer. J Cancer Res Clin Oncol 2018; 144:883-891. [DOI: 10.1007/s00432-018-2615-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2017] [Accepted: 02/20/2018] [Indexed: 01/04/2023]
|
6
|
Jaeger S, Min J, Nigsch F, Camargo M, Hutz J, Cornett A, Cleaver S, Buckler A, Jenkins JL. Causal Network Models for Predicting Compound Targets and Driving Pathways in Cancer. ACTA ACUST UNITED AC 2014; 19:791-802. [PMID: 24518063 DOI: 10.1177/1087057114522690] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2013] [Accepted: 01/14/2014] [Indexed: 02/06/2023]
Abstract
Gene-expression data are often used to infer pathways regulating transcriptional responses. For example, differentially expressed genes (DEGs) induced by compound treatment can help characterize hits from phenotypic screens, either by correlation with known drug signatures or by pathway enrichment. Pathway enrichment is, however, typically computed with DEGs rather than "upstream" nodes that are potentially causal of "downstream" changes. Here, we present graph-based models to predict causal targets from compound-microarray data. We test several approaches to traversing network topology, and show that a consensus minimum-rank score (SigNet) beat individual methods and could highly rank compound targets among all network nodes. In addition, larger, less canonical networks outperformed linear canonical interactions. Importantly, pathway enrichment using causal nodes rather than DEGs recovers relevant pathways more often. To further validate our approach, we used integrated data sets from the Cancer Genome Atlas to identify driving pathways in triple-negative breast cancer. Critical pathways were uncovered, including the epidermal growth factor receptor 2-phosphatidylinositide 3-kinase-AKT-MAPK growth pathway andATR-p53-BRCA DNA damage pathway, in addition to unexpected pathways, such as TGF-WNT cytoskeleton remodeling, IL12-induced interferon gamma production, and TNFR-IAP (inhibitor of apoptosis) apoptosis; the latter was validated by pooled small hairpin RNA profiling in cancer cells. Overall, our approach can bridge transcriptional profiles to compound targets and driving pathways in cancer.
Collapse
Affiliation(s)
- Savina Jaeger
- Co-first authors Oncology Translational Medicine, Novartis, Cambridge, MA, USA
| | - Junxia Min
- Co-first authors ONC Target Discovery, Novartis Institutes for BioMedical Research, Inc., Cambridge, MA, USA
| | - Florian Nigsch
- Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research, Inc., Basel, Switzerland
| | - Miguel Camargo
- Center for Proteomic Chemistry, Novartis Institutes for BioMedical Research, Inc., Cambridge, MA, USA
| | - Janna Hutz
- Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research, Inc., Cambridge, MA, USA Pfizer, Cambridge, MA, USA
| | - Allen Cornett
- Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research, Inc., Cambridge, MA, USA
| | - Stephen Cleaver
- NIBR IT, Novartis Institutes for BioMedical Research, Inc., Cambridge, MA, USA
| | - Alan Buckler
- Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research, Inc., Cambridge, MA, USA
| | - Jeremy L Jenkins
- Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research, Inc., Cambridge, MA, USA
| |
Collapse
|
7
|
Sarajlić A, Filipović A, Janjić V, Coombes RC, Pržulj N. The role of genes co-amplified with nicastrin in breast invasive carcinoma. Breast Cancer Res Treat 2013; 143:393-401. [DOI: 10.1007/s10549-013-2805-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2013] [Accepted: 12/03/2013] [Indexed: 12/21/2022]
|
8
|
Sarajlić A, Janjić V, Stojković N, Radak D, Pržulj N. Network topology reveals key cardiovascular disease genes. PLoS One 2013; 8:e71537. [PMID: 23977067 PMCID: PMC3744556 DOI: 10.1371/journal.pone.0071537] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 06/29/2013] [Indexed: 11/19/2022] Open
Abstract
The structure of protein-protein interaction (PPI) networks has already been successfully used as a source of new biological information. Even though cardiovascular diseases (CVDs) are a major global cause of death, many CVD genes still await discovery. We explore ways to utilize the structure of the human PPI network to find important genes for CVDs that should be targeted by drugs. The hope is to use the properties of such important genes to predict new ones, which would in turn improve a choice of therapy. We propose a methodology that examines the PPI network wiring around genes involved in CVDs. We use the methodology to identify a subset of CVD-related genes that are statistically significantly enriched in drug targets and "driver genes." We seek such genes, since driver genes have been proposed to drive onset and progression of a disease. Our identified subset of CVD genes has a large overlap with the Core Diseasome, which has been postulated to be the key to disease formation and hence should be the primary object of therapeutic intervention. This indicates that our methodology identifies "key" genes responsible for CVDs. Thus, we use it to predict new CVD genes and we validate over 70% of our predictions in the literature. Finally, we show that our predicted genes are functionally similar to currently known CVD drug targets, which confirms a potential utility of our methodology towards improving therapy for CVDs.
Collapse
Affiliation(s)
- Anida Sarajlić
- Department of Computing, Imperial College London, London, United Kingdom
| | - Vuk Janjić
- Department of Computing, Imperial College London, London, United Kingdom
| | - Neda Stojković
- Institute for Cardiovascular Disease “Dedinje,” University of Belgrade, Belgrade, Serbia
| | - Djordje Radak
- Institute for Cardiovascular Disease “Dedinje,” University of Belgrade, Belgrade, Serbia
| | - Nataša Pržulj
- Department of Computing, Imperial College London, London, United Kingdom
| |
Collapse
|
9
|
Qian L, Zheng H, Zhou H, Qin R, Li J. Classification of time series gene expression in clinical studies via integration of biological network. PLoS One 2013; 8:e58383. [PMID: 23516469 PMCID: PMC3596388 DOI: 10.1371/journal.pone.0058383] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2012] [Accepted: 02/04/2013] [Indexed: 12/24/2022] Open
Abstract
The increasing availability of time series expression datasets, although promising, raises a number of new computational challenges. Accordingly, the development of suitable classification methods to make reliable and sound predictions is becoming a pressing issue. We propose, here, a new method to classify time series gene expression via integration of biological networks. We evaluated our approach on 2 different datasets and showed that the use of a hidden Markov model/Gaussian mixture models hybrid explores the time-dependence of the expression data, thereby leading to better prediction results. We demonstrated that the biclustering procedure identifies function-related genes as a whole, giving rise to high accordance in prognosis prediction across independent time series datasets. In addition, we showed that integration of biological networks into our method significantly improves prediction performance. Moreover, we compared our approach with several state-of-the-art algorithms and found that our method outperformed previous approaches with regard to various criteria. Finally, our approach achieved better prediction results on early-stage data, implying the potential of our method for practical prediction.
Collapse
Affiliation(s)
- Liwei Qian
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
| | - Haoran Zheng
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
- Anhui Key Laboratory of Software Engineering in Computing and Communication, University of Science and Technology of China, Hefei, People's Republic of China
- Department of Systems Biology, University of Science and Technology of China, Hefei, People's Republic of China
- * E-mail:
| | - Hong Zhou
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
| | - Ruibin Qin
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
| | - Jinlong Li
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China
| |
Collapse
|
10
|
Alshalalfa M, Bader GD, Goldenberg A, Morris Q, Alhajj R. Detecting microRNAs of high influence on protein functional interaction networks: a prostate cancer case study. BMC SYSTEMS BIOLOGY 2012; 6:112. [PMID: 22929553 PMCID: PMC3490713 DOI: 10.1186/1752-0509-6-112] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2012] [Accepted: 08/14/2012] [Indexed: 11/10/2022]
Abstract
BACKGROUND The use of biological molecular network information for diagnostic and prognostic purposes and elucidation of molecular disease mechanism is a key objective in systems biomedicine. The network of regulatory miRNA-target and functional protein interactions is a rich source of information to elucidate the function and the prognostic value of miRNAs in cancer. The objective of this study is to identify miRNAs that have high influence on target protein complexes in prostate cancer as a case study. This could provide biomarkers or therapeutic targets relevant for prostate cancer treatment. RESULTS Our findings demonstrate that a miRNA's functional role can be explained by its target protein connectivity within a physical and functional interaction network. To detect miRNAs with high influence on target protein modules, we integrated miRNA and mRNA expression profiles with a sequence based miRNA-target network and human functional and physical protein interactions (FPI). miRNAs with high influence on target protein complexes play a role in prostate cancer progression and are promising diagnostic or prognostic biomarkers. We uncovered several miRNA-regulated protein modules which were enriched in focal adhesion and prostate cancer genes. Several miRNAs such as miR-96, miR-182, and miR-143 demonstrated high influence on their target protein complexes and could explain most of the gene expression changes in our analyzed prostate cancer data set. CONCLUSIONS We describe a novel method to identify active miRNA-target modules relevant to prostate cancer progression and outcome. miRNAs with high influence on protein networks are valuable biomarkers that can be used in clinical investigations for prostate cancer treatment.
Collapse
|
11
|
Fang X, Netzer M, Baumgartner C, Bai C, Wang X. Genetic network and gene set enrichment analysis to identify biomarkers related to cigarette smoking and lung cancer. Cancer Treat Rev 2012; 39:77-88. [PMID: 22789435 DOI: 10.1016/j.ctrv.2012.06.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2012] [Revised: 06/03/2012] [Accepted: 06/06/2012] [Indexed: 10/28/2022]
Abstract
OBJECTIVES Cigarette smoking is the most demonstrated risk factor for the development of lung cancer, while the related genetic mechanisms are still unclear. METHODS The preprocessed microarray expression dataset was downloaded from Gene Expression Omnibus database. Samples were classified according to the disease state, stage and smoking state. A new computational strategy was applied for the identification and biological interpretation of new candidate genes in lung cancer and smoking by coupling a network-based approach with gene set enrichment analysis. MEASUREMENTS Network analysis was performed by pair-wise comparison according to the disease states (tumor or normal), smoking states (current smokers or nonsmokers or former smokers), or the disease stage (stages I-IV). The most activated metabolic pathways were identified by gene set enrichment analysis. RESULTS Panels of top ranked gene candidates in smoking or cancer development were identified, including genes involved in cell proliferation and drug metabolism like cytochrome P450 and WW domain containing transcription regulator 1. Semaphorin 5A and protein phosphatase 1F are the common genes represented as major hubs in both the smoking and cancer related network. Six pathways, e.g. cell cycle, DNA replication, RNA transport, protein processing in endoplasmic reticulum, vascular smooth muscle contraction and endocytosis were commonly involved in smoking and lung cancer when comparing the top ten selected pathways. CONCLUSION New approach of bioinformatics for biomarker identification and validation can probe into deep genetic relationships between cigarette smoking and lung cancer. Our studies indicate that disease-specific network biomarkers, interaction between genes/proteins, or cross-talking of pathways provide more specific values for the development of precision therapies for lung.
Collapse
Affiliation(s)
- Xiaocong Fang
- Department of Pulmonary Medicine, Zhongshan Hospital, Fudan University, Shanghai, China.
| | | | | | | | | |
Collapse
|