1
|
Ahn B, Chou C, Chou C, Chen J, Zug A, Baykara Y, Claus J, Hacking SM, Uzun A, Gamsiz Uzun E. The Atlas of Protein-Protein Interactions in Cancer (APPIC)-a webtool to visualize and analyze cancer subtypes. NAR Cancer 2025; 7:zcae047. [PMID: 39822275 PMCID: PMC11734624 DOI: 10.1093/narcan/zcae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 11/27/2024] [Accepted: 01/03/2025] [Indexed: 01/19/2025] Open
Abstract
Cancer is a complex disease with heterogeneous mutational and gene expression patterns. Subgroups of patients who share a phenotype might share a specific genetic architecture including protein-protein interactions (PPIs). We developed the Atlas of Protein-Protein Interactions in Cancer (APPIC), an interactive webtool that provides PPI subnetworks of 10 cancer types and their subtypes shared by cohorts of patients. To achieve this, we analyzed publicly available RNA sequencing data from patients and identified PPIs specific to 26 distinct cancer subtypes. APPIC compiles biological and clinical information from various databases, including the Human Protein Atlas, Hugo Gene Nomenclature Committee, g:Profiler, cBioPortal and Clue.io. The user-friendly interface allows for both 2D and 3D PPI network visualizations, enhancing the usability and interpretability of complex data. For advanced users seeking greater customization, APPIC conveniently provides all output files for further analysis and visualization on other platforms or tools. By offering comprehensive insights into PPIs and their role in cancer, APPIC aims to support the discovery of tumor subtype-specific novel targeted therapeutics and drug repurposing. APPIC is freely available at https://appic.brown.edu.
Collapse
Affiliation(s)
- Benjamin Ahn
- Department of Pathology and Laboratory Medicine, The Warren Alpert Medical School of Brown University, 593 Eddy Street, Providence, RI 02903, USA
| | - Charissa Chou
- Department of Pathology and Laboratory Medicine, Rhode Island Hospital, 593 Eddy Street, Providence, RI 02903, USA
| | - Caden Chou
- Department of Pathology and Laboratory Medicine, The Warren Alpert Medical School of Brown University, 593 Eddy Street, Providence, RI 02903, USA
| | - Jennifer Chen
- Department of Pathology and Laboratory Medicine, The Warren Alpert Medical School of Brown University, 593 Eddy Street, Providence, RI 02903, USA
| | - Amelia Zug
- Department of Pathology and Laboratory Medicine, The Warren Alpert Medical School of Brown University, 593 Eddy Street, Providence, RI 02903, USA
| | - Yigit Baykara
- Department of Pathology and Laboratory Medicine, The Warren Alpert Medical School of Brown University, 593 Eddy Street, Providence, RI 02903, USA
- Department of Pathology and Laboratory Medicine, Rhode Island Hospital, 593 Eddy Street, Providence, RI 02903, USA
| | - Jessica Claus
- Department of Pathology and Laboratory Medicine, The Warren Alpert Medical School of Brown University, 593 Eddy Street, Providence, RI 02903, USA
- Department of Pathology and Laboratory Medicine, Rhode Island Hospital, 593 Eddy Street, Providence, RI 02903, USA
| | - Sean M Hacking
- Department of Pathology, NYU Grossman School of Medicine, 550 1st Ave., New York, NY 10016, USA
| | - Alper Uzun
- Department of Pathology and Laboratory Medicine, The Warren Alpert Medical School of Brown University, 593 Eddy Street, Providence, RI 02903, USA
- Legoretta Cancer Center, Brown University, 70 Ship Street, Providence, RI 02903, USA
- Brown Center for Clinical Cancer Informatics and Data Science (CCIDS), Brown University, 593 Eddy Street, Providence, RI 02903, USA
- Center for Computational Molecular Biology, Brown University, 164 Angell Street, Providence, RI 02906, USA
| | - Ece D Gamsiz Uzun
- Department of Pathology and Laboratory Medicine, The Warren Alpert Medical School of Brown University, 593 Eddy Street, Providence, RI 02903, USA
- Department of Pathology and Laboratory Medicine, Rhode Island Hospital, 593 Eddy Street, Providence, RI 02903, USA
- Legoretta Cancer Center, Brown University, 70 Ship Street, Providence, RI 02903, USA
- Brown Center for Clinical Cancer Informatics and Data Science (CCIDS), Brown University, 593 Eddy Street, Providence, RI 02903, USA
- Center for Computational Molecular Biology, Brown University, 164 Angell Street, Providence, RI 02906, USA
| |
Collapse
|
2
|
Ferrari I, De Grossi F, Lai G, Oliveto S, Deroma G, Biffo S, Manfrini N. CancerHubs: a systematic data mining and elaboration approach for identifying novel cancer-related protein interaction hubs. Brief Bioinform 2024; 26:bbae635. [PMID: 39657701 PMCID: PMC11631132 DOI: 10.1093/bib/bbae635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 10/31/2024] [Accepted: 11/22/2024] [Indexed: 12/12/2024] Open
Abstract
Conventional approaches to predict protein involvement in cancer often rely on defining either aberrant mutations at the single-gene level or correlating/anti-correlating transcript levels with patient survival. These approaches are typically conducted independently and focus on one protein at a time, overlooking nucleotide substitutions outside of coding regions or mutational co-occurrences in genes within the same interaction network. Here, we present CancerHubs, a method that integrates unbiased mutational data, clinical outcome predictions and interactomics to define novel cancer-related protein hubs. Through this approach, we identified TGOLN2 as a putative novel broad cancer tumour suppressor and EFTUD2 as a putative novel multiple myeloma oncogene.
Collapse
Affiliation(s)
- Ivan Ferrari
- INGM, Istituto Nazionale Genetica Molecolare Romeo ed Enrica Invernizzi, Milan, Italy
- Department of Biosciences, University of Milan, Milan, Italy
| | - Federica De Grossi
- INGM, Istituto Nazionale Genetica Molecolare Romeo ed Enrica Invernizzi, Milan, Italy
- Department of Biosciences, University of Milan, Milan, Italy
| | - Giancarlo Lai
- INGM, Istituto Nazionale Genetica Molecolare Romeo ed Enrica Invernizzi, Milan, Italy
- Department of Biosciences, University of Milan, Milan, Italy
| | - Stefania Oliveto
- INGM, Istituto Nazionale Genetica Molecolare Romeo ed Enrica Invernizzi, Milan, Italy
- Department of Biosciences, University of Milan, Milan, Italy
| | - Giorgia Deroma
- INGM, Istituto Nazionale Genetica Molecolare Romeo ed Enrica Invernizzi, Milan, Italy
- Department of Biosciences, University of Milan, Milan, Italy
| | - Stefano Biffo
- INGM, Istituto Nazionale Genetica Molecolare Romeo ed Enrica Invernizzi, Milan, Italy
- Department of Biosciences, University of Milan, Milan, Italy
| | - Nicola Manfrini
- INGM, Istituto Nazionale Genetica Molecolare Romeo ed Enrica Invernizzi, Milan, Italy
- Department of Biosciences, University of Milan, Milan, Italy
| |
Collapse
|
3
|
Geraghty S, Boyer JA, Fazel-Zarandi M, Arzouni N, Ryseck RP, McBride MJ, Parsons LR, Rabinowitz JD, Singh M. Integrative Computational Framework, Dyscovr, Links Mutated Driver Genes to Expression Dysregulation Across 19 Cancer Types. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.20.624509. [PMID: 39605479 PMCID: PMC11601522 DOI: 10.1101/2024.11.20.624509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Though somatic mutations play a critical role in driving cancer initiation and progression, the systems-level functional impacts of these mutations-particularly, how they alter expression across the genome and give rise to cancer hallmarks-are not yet well-understood, even for well-studied cancer driver genes. To address this, we designed an integrative machine learning model, Dyscovr, that leverages mutation, gene expression, copy number alteration (CNA), methylation, and clinical data to uncover putative relationships between nonsynonymous mutations in key cancer driver genes and transcriptional changes across the genome. We applied Dyscovr pan-cancer and within 19 individual cancer types, finding both broadly relevant and cancer type-specific links between driver genes and putative targets, including a subset we further identify as exhibiting negative genetic relationships. Our work newly implicates-and validates in cell lines-KBTBD2 and mutant PIK3CA as putative synthetic lethals in breast cancer, suggesting a novel combinatorial treatment approach.
Collapse
Affiliation(s)
- Sara Geraghty
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - Jacob A. Boyer
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
- Ludwig Cancer Institute, Princeton Branch, Princeton University, Princeton, NJ 08554
| | - Mahya Fazel-Zarandi
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - Nibal Arzouni
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - Rolf-Peter Ryseck
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - Matthew J. McBride
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
- Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ 08854
| | - Lance R. Parsons
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - Joshua D. Rabinowitz
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
- Ludwig Cancer Institute, Princeton Branch, Princeton University, Princeton, NJ 08554
- Department of Chemistry, Princeton University, Princeton, NJ 08544
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
- Department of Computer Science, Princeton University, Princeton, NJ 08544
- Lead Contact
| |
Collapse
|
4
|
Huang X, Zhang H. Detecting responsible nodes in differential Bayesian networks. Stat Med 2024; 43:3294-3312. [PMID: 38831542 DOI: 10.1002/sim.10125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 03/25/2024] [Accepted: 05/18/2024] [Indexed: 06/05/2024]
Abstract
To study the roles that different nodes play in differentiating Bayesian networks under two states, such as control versus disease, we formulate two node-specific scores to facilitate such assessment. The first score is motivated by the prediction invariance property of a causal model. The second score results from modifying an existing score constructed for differential analysis of undirected networks. We develop strategies based on these scores to identify nodes responsible for topological differences between two Bayesian networks. Synthetic data and real-life data from designed experiments are used to demonstrate the efficacy of the proposed methods in detecting responsible nodes.
Collapse
Affiliation(s)
- Xianzheng Huang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| | - Hongmei Zhang
- Division of Epidemiology, Biostatistics, and Environmental Health, School of Public Health, University of Memphis, Memphis, Tennessee
| |
Collapse
|
5
|
Cheng X, Amanullah M, Liu W, Liu Y, Pan X, Zhang H, Xu H, Liu P, Lu Y. WMDS.net: a network control framework for identifying key players in transcriptome programs. Bioinformatics 2023; 39:7023921. [PMID: 36727489 PMCID: PMC9925106 DOI: 10.1093/bioinformatics/btad071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 01/16/2023] [Accepted: 02/01/2023] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Mammalian cells can be transcriptionally reprogramed to other cellular phenotypes. Controllability of such complex transitions in transcriptional networks underlying cellular phenotypes is an inherent biological characteristic. This network controllability can be interpreted by operating a few key regulators to guide the transcriptional program from one state to another. Finding the key regulators in the transcriptional program can provide key insights into the network state transition underlying cellular phenotypes. RESULTS To address this challenge, here, we proposed to identify the key regulators in the transcriptional co-expression network as a minimum dominating set (MDS) of driver nodes that can fully control the network state transition. Based on the theory of structural controllability, we developed a weighted MDS network model (WMDS.net) to find the driver nodes of differential gene co-expression networks. The weight of WMDS.net integrates the degree of nodes in the network and the significance of gene co-expression difference between two physiological states into the measurement of node controllability of the transcriptional network. To confirm its validity, we applied WMDS.net to the discovery of cancer driver genes in RNA-seq datasets from The Cancer Genome Atlas. WMDS.net is powerful among various cancer datasets and outperformed the other top-tier tools with a better balance between precision and recall. AVAILABILITY AND IMPLEMENTATION https://github.com/chaofen123/WMDS.net. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiang Cheng
- Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310006, China.,Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| | - Md Amanullah
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China.,Department of Respiratory Medicine, Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310016, China
| | - Weigang Liu
- Department of Respiratory Medicine, Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310016, China
| | - Yi Liu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China.,Department of Respiratory Medicine, Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310016, China
| | - Xiaoqing Pan
- Department of Mathematics, Shanghai Normal University, Xuhui 200234, China
| | - Honghe Zhang
- Department of Pathology, Research Unit of Intelligence Classification of Tumor Pathology and Precision Therapy, Chinese Academy of Medical Sciences, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| | - Pengyuan Liu
- Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310006, China.,Department of Physiology, Center of Systems Molecular Medicine, Medical College of Wisconsin, Milwaukee, WI 53226, USA.,Cancer Center, Zhejiang University, Hangzhou 310029, China
| | - Yan Lu
- Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310006, China.,Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China.,Cancer Center, Zhejiang University, Hangzhou 310029, China
| |
Collapse
|
6
|
Ershov P, Yablokov E, Mezentsev Y, Ivanov A. Interactomics of CXXC proteins involved in epigenetic regulation of gene expression. BIOMEDITSINSKAYA KHIMIYA 2022; 68:339-351. [DOI: 10.18097/pbmc20226805339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Regulation of gene expression is an extremely complex and multicomponent biological phenomenon. Proteins containing the CXXC-domain “zinc fingers” (CXXC-proteins) are master regulators of expression of many genes and have conserved functions of methylation of DNA bases and histone proteins. CXXC proteins function as a part of multiprotein complexes, which indicates the fundamental importance of studying post-translational regulation through modulation of the protein-protein interaction spectrum (PPI) in both normal and pathological conditions. In this paper we discuss general aspects of the involvement of CXXC proteins and their protein partners in neoplastic processes, both from the literature data and our own studies. Special attention is paid to recent data on the particular interactomics of the CFP1 protein encoded by the CXXC1 gene located on the human chromosome 18. CFP1 is devoid of enzymatic activity and implements epigenetic regulation of expression through binding to chromatin and a certain spectrum of PPIs.
Collapse
Affiliation(s)
- P.V. Ershov
- Institute of Biomedical Chemistry, Moscow, Russia
| | | | | | - A.S. Ivanov
- Institute of Biomedical Chemistry, Moscow, Russia
| |
Collapse
|
7
|
Hu CW, Xie J, Jiang J. The Emerging Roles of Protein Interactions with O-GlcNAc Cycling Enzymes in Cancer. Cancers (Basel) 2022; 14:5135. [PMID: 36291918 PMCID: PMC9600386 DOI: 10.3390/cancers14205135] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 10/18/2022] [Accepted: 10/18/2022] [Indexed: 09/11/2023] Open
Abstract
The dynamic O-GlcNAc modification of intracellular proteins is an important nutrient sensor for integrating metabolic signals into vast networks of highly coordinated cellular activities. Dysregulation of the sole enzymes responsible for O-GlcNAc cycling, O-GlcNAc transferase (OGT) and O-GlcNAcase (OGA), and the associated cellular O-GlcNAc profile is a common feature across nearly every cancer type. Many studies have investigated the effects of aberrant OGT/OGA expression on global O-GlcNAcylation activity in cancer cells. However, recent studies have begun to elucidate the roles of protein-protein interactions (PPIs), potentially through regions outside of the immediate catalytic site of OGT/OGA, that regulate greater protein networks to facilitate substrate-specific modification, protein translocalization, and the assembly of larger biomolecular complexes. Perturbation of OGT/OGA PPI networks makes profound changes in the cell and may directly contribute to cancer malignancies. Herein, we highlight recent studies on the structural features of OGT and OGA, as well as the emerging roles and molecular mechanisms of their aberrant PPIs in rewiring cancer networks. By integrating complementary approaches, the research in this area will aid in the identification of key protein contacts and functional modules derived from OGT/OGA that drive oncogenesis and will illuminate new directions for anti-cancer drug development.
Collapse
Affiliation(s)
| | | | - Jiaoyang Jiang
- Pharmaceutical Sciences Division, School of Pharmacy, University of Wisconsin-Madison, Madison, WI 53705, USA
| |
Collapse
|
8
|
Yue R, Dutta A. Computational systems biology in disease modeling and control, review and perspectives. NPJ Syst Biol Appl 2022; 8:37. [PMID: 36192551 PMCID: PMC9528884 DOI: 10.1038/s41540-022-00247-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 09/05/2022] [Indexed: 02/02/2023] Open
Abstract
Omics-based approaches have become increasingly influential in identifying disease mechanisms and drug responses. Considering that diseases and drug responses are co-expressed and regulated in the relevant omics data interactions, the traditional way of grabbing omics data from single isolated layers cannot always obtain valuable inference. Also, drugs have adverse effects that may impair patients, and launching new medicines for diseases is costly. To resolve the above difficulties, systems biology is applied to predict potential molecular interactions by integrating omics data from genomic, proteomic, transcriptional, and metabolic layers. Combined with known drug reactions, the resulting models improve medicines' therapeutical performance by re-purposing the existing drugs and combining drug molecules without off-target effects. Based on the identified computational models, drug administration control laws are designed to balance toxicity and efficacy. This review introduces biomedical applications and analyses of interactions among gene, protein and drug molecules for modeling disease mechanisms and drug responses. The therapeutical performance can be improved by combining the predictive and computational models with drug administration designed by control laws. The challenges are also discussed for its clinical uses in this work.
Collapse
Affiliation(s)
- Rongting Yue
- Department of Electrical and Computer Engineering, University of Connecticut, 371 Fairfield Way, Storrs, CT, 06269, USA.
| | - Abhishek Dutta
- Department of Electrical and Computer Engineering, University of Connecticut, 371 Fairfield Way, Storrs, CT, 06269, USA
| |
Collapse
|
9
|
Hou S, Zhang P, Yang K, Wang L, Ma C, Li Y, Li S. Decoding multilevel relationships with the human tissue-cell-molecule network. Brief Bioinform 2022; 23:6585388. [PMID: 35551347 DOI: 10.1093/bib/bbac170] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 04/09/2022] [Accepted: 04/16/2022] [Indexed: 02/01/2023] Open
Abstract
Understanding the biological functions of molecules in specific human tissues or cell types is crucial for gaining insights into human physiology and disease. To address this issue, it is essential to systematically uncover associations among multilevel elements consisting of disease phenotypes, tissues, cell types and molecules, which could pose a challenge because of their heterogeneity and incompleteness. To address this challenge, we describe a new methodological framework, called Graph Local InfoMax (GLIM), based on a human multilevel network (HMLN) that we established by introducing multiple tissues and cell types on top of molecular networks. GLIM can systematically mine the potential relationships between multilevel elements by embedding the features of the HMLN through contrastive learning. Our simulation results demonstrated that GLIM consistently outperforms other state-of-the-art algorithms in disease gene prediction. Moreover, GLIM was also successfully used to infer cell markers and rewire intercellular and molecular interactions in the context of specific tissues or diseases. As a typical case, the tissue-cell-molecule network underlying gastritis and gastric cancer was first uncovered by GLIM, providing systematic insights into the mechanism underlying the occurrence and development of gastric cancer. Overall, our constructed methodological framework has the potential to systematically uncover complex disease mechanisms and mine high-quality relationships among phenotypical, tissue, cellular and molecular elements.
Collapse
Affiliation(s)
- Siyu Hou
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China
| | - Peng Zhang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China
| | - Kuo Yang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China.,School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
| | - Lan Wang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China
| | - Changzheng Ma
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China
| | - Yanda Li
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China
| | - Shao Li
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, 100084 Beijing, China
| |
Collapse
|
10
|
Tan YT, Ou-Yang L, Jiang X, Yan H, Zhang XF. Identifying Gene Network Rewiring Based on Partial Correlation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:513-521. [PMID: 32750866 DOI: 10.1109/tcbb.2020.3002906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
It is an important task to learn how gene regulatory networks change under different conditions. Several Gaussian graphical model-based methods have been proposed to deal with this task by inferring differential networks from gene expression data. However, most existing methods define the differential networks as the difference of precision matrices, which may include false differential edges caused by the change of conditional variances. In addition, prior information about the condition-specific networks and the differential networks can be obtained from other domains. It is useful to incorporate prior information into differential network analysis. In this study, we propose a new differential network analysis method to address the above challenges. Instead of using the precision matrices, we define the differential networks as the difference of partial correlations, which can exclude the spurious differential edges due to the variants of conditional variances. Furthermore, prior information from multiple hypothesis testing is incorporated using a weighted fused penalty. Simulation studies show that our method outperforms the competing methods. We also apply our method to identify the differential network between luminal A and basal-like subtypes of breast cancers and the differential network between acute myeloid leukemia tumors and normal samples. The hub genes in the differential networks identified by our method carry out important biological functions.
Collapse
|
11
|
Liu C, Cai D, Zeng W, Huang Y. Inferring Differential Networks by Integrating Gene Expression Data With Additional Knowledge. Front Genet 2021; 12:760155. [PMID: 34858477 PMCID: PMC8632038 DOI: 10.3389/fgene.2021.760155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 10/13/2021] [Indexed: 11/23/2022] Open
Abstract
Evidences increasingly indicate the involvement of gene network rewiring in disease development and cell differentiation. With the accumulation of high-throughput gene expression data, it is now possible to infer the changes of gene networks between two different states or cell types via computational approaches. However, the distribution diversity of multi-platform gene expression data and the sparseness and high noise rate of single-cell RNA sequencing (scRNA-seq) data raise new challenges for existing differential network estimation methods. Furthermore, most existing methods are purely rely on gene expression data, and ignore the additional information provided by various existing biological knowledge. In this study, to address these challenges, we propose a general framework, named weighted joint sparse penalized D-trace model (WJSDM), to infer differential gene networks by integrating multi-platform gene expression data and multiple prior biological knowledge. Firstly, a non-paranormal graphical model is employed to tackle gene expression data with missing values. Then we propose a weighted group bridge penalty to integrate multi-platform gene expression data and various existing biological knowledge. Experiment results on synthetic data demonstrate the effectiveness of our method in inferring differential networks. We apply our method to the gene expression data of ovarian cancer and the scRNA-seq data of circulating tumor cells of prostate cancer, and infer the differential network associated with platinum resistance of ovarian cancer and anti-androgen resistance of prostate cancer. By analyzing the estimated differential networks, we find some important biological insights about the mechanisms underlying platinum resistance of ovarian cancer and anti-androgen resistance of prostate cancer.
Collapse
Affiliation(s)
- Chen Liu
- Department of Chemotherapy, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - WuCha Zeng
- Department of Chemotherapy, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Yun Huang
- Department of Geriatric Medicine, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| |
Collapse
|
12
|
Network-based protein-protein interaction prediction method maps perturbations of cancer interactome. PLoS Genet 2021; 17:e1009869. [PMID: 34727106 PMCID: PMC8610286 DOI: 10.1371/journal.pgen.1009869] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 11/23/2021] [Accepted: 10/09/2021] [Indexed: 01/09/2023] Open
Abstract
The perturbations of protein-protein interactions (PPIs) were found to be the main cause of cancer. Previous PPI prediction methods which were trained with non-disease general PPI data were not compatible to map the PPI network in cancer. Therefore, we established a novel cancer specific PPI prediction method dubbed NECARE, which was based on relational graph convolutional network (R-GCN) with knowledge-based features. It achieved the best performance with a Matthews correlation coefficient (MCC) = 0.84±0.03 and an F1 = 91±2% compared with other methods. With NECARE, we mapped the cancer interactome atlas and revealed that the perturbations of PPIs were enriched on 1362 genes, which were named cancer hub genes. Those genes were found to over-represent with mutations occurring at protein-macromolecules binding interfaces. Furthermore, over 56% of cancer treatment-related genes belonged to hub genes and they were significantly related to the prognosis of 32 types of cancers. Finally, by coimmunoprecipitation, we confirmed that the NECARE prediction method was highly reliable with a 90% accuracy. Overall, we provided the novel network-based cancer protein-protein interaction prediction method and mapped the perturbation of cancer interactome. NECARE is available at: https://github.com/JiajunQiu/NECARE.
Collapse
|
13
|
Tu JJ, Ou-Yang L, Zhu Y, Yan H, Qin H, Zhang XF. Differential network analysis by simultaneously considering changes in gene interactions and gene expression. Bioinformatics 2021; 37:4414-4423. [PMID: 34245246 DOI: 10.1093/bioinformatics/btab502] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 06/13/2021] [Accepted: 07/05/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Differential network analysis is an important tool to investigate the rewiring of gene interactions under different conditions. Several computational methods have been developed to estimate differential networks from gene expression data, but most of them do not consider that gene network rewiring may be driven by the differential expression of individual genes. New differential network analysis methods that simultaneously take account of the changes in gene interactions and changes in expression levels are needed. RESULTS In this paper, we propose a differential network analysis method that considers the differential expression of individual genes when identifying differential edges. First, two hypothesis test statistics are used to quantify changes in partial correlations between gene pairs and changes in expression levels for individual genes. Then, an optimization framework is proposed to combine the two test statistics so that the resulting differential network has a hierarchical property, where a differential edge can be considered only if at least one of the two involved genes is differentially expressed. Simulation results indicate that our method outperforms current state-of-the-art methods. We apply our method to identify the differential networks between the luminal A and basal-like subtypes of breast cancer and those between acute myeloid leukemia and normal samples. Hub nodes in the differential networks estimated by our method, including both differentially and non-differentially expressed genes, have important biological functions. AVAILABILITY The source code is available at https://github.com/Zhangxf-ccnu/chNet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jia-Juan Tu
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, 430074, China.,Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, China University of Geosciences, Wuhan, 430074, China
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - Hong Qin
- Department of Statistics, Zhongnan University of Economics and Law, Wuhan, 430073, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China
| |
Collapse
|
14
|
Xu T, Ou-Yang L, Yan H, Zhang XF. Time-Varying Differential Network Analysis for Revealing Network Rewiring over Cancer Progression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1632-1642. [PMID: 31647444 DOI: 10.1109/tcbb.2019.2949039] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
To reveal how gene regulatory networks change over cancer development, multiple time-varying differential networks between adjacent cancer stages should be estimated simultaneously. Since the network rewiring may be driven by the perturbation of certain individual genes, there may be some hub nodes shared by these differential networks. Although several methods have been developed to estimate differential networks from gene expression data, most of them are designed for estimating a single differential network, which neglect the similarities between different differential networks. In this article, we propose a new Gaussian graphical model-based method to jointly estimate multiple time-varying differential networks for identifying network rewiring over cancer development. A D-trace loss is used to determine the differential networks. A tree-structured group Lasso penalty is designed to identify the common hub nodes shared by different differential networks and the specific hub nodes unique to individual differential networks. Simulation experiment results demonstrate that our method outperforms other state-of-the-art techniques in most cases. We also apply our method to The Cancer Genome Atlas data to explore gene network rewiring over different breast cancer stages. Hub nodes in the estimated differential networks rediscover well known genes associated with the development and progression of breast cancer.
Collapse
|
15
|
Xie J, Yang F, Wang J, Karikomi M, Yin Y, Sun J, Wen T, Nie Q. DNF: A differential network flow method to identify rewiring drivers for gene regulatory networks. Neurocomputing 2020; 410:202-210. [PMID: 34025035 PMCID: PMC8139126 DOI: 10.1016/j.neucom.2020.05.028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Differential network analysis has become an important approach in identifying driver genes in development and disease. However, most studies capture only local features of the underlying gene-regulatory network topology. These approaches are vulnerable to noise and other changes which mask driver-gene activity. Therefore, methods are urgently needed which can separate the impact of true regulatory elements from stochastic changes and downstream effects. We propose the differential network flow (DNF) method to identify key regulators of progression in development or disease. Given the network representation of consecutive biological states, DNF quantifies the essentiality of each node by differences in the distribution of network flow, which are capable of capturing comprehensive topological differences from local to global feature domains. DNF achieves more accurate driver-gene identification than other state-of-the-art methods when applied to four human datasets from The Cancer Genome Atlas and three single-cell RNA-seq datasets of murine neural and hematopoietic differentiation. Furthermore, we predict key regulators of crosstalk between separate networks underlying both neuronal differentiation and the progression of neurodegenerative disease, among which APP is predicted as a driver gene of neural stem cell differentiation. Our method is a new approach for quantifying the essentiality of genes across networks of different biological states.
Collapse
Affiliation(s)
- Jiang Xie
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| | - Fuzhang Yang
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| | - Jiao Wang
- Laboratory of Molecular Neural Biology, School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Mathew Karikomi
- Department of Mathematics, Department of Developmental and Cell Biology, University of California, Irvine, CA 92697-3875, USA
| | - Yiting Yin
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| | - Jiamin Sun
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| | - Tieqiao Wen
- Laboratory of Molecular Neural Biology, School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Qing Nie
- Department of Mathematics, Department of Developmental and Cell Biology, University of California, Irvine, CA 92697-3875, USA
| |
Collapse
|
16
|
Rangaswamy U, Dharshini SAP, Yesudhas D, Gromiha MM. VEPAD - Predicting the effect of variants associated with Alzheimer's disease using machine learning. Comput Biol Med 2020; 124:103933. [PMID: 32828070 DOI: 10.1016/j.compbiomed.2020.103933] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 07/25/2020] [Accepted: 07/25/2020] [Indexed: 12/26/2022]
Abstract
INTRODUCTION Alzheimer's disease (AD) is a complex and heterogeneous disease that affects neuronal cells over time and it is prevalent among all neurodegenerative diseases. Next Generation Sequencing (NGS) techniques are widely used for developing high-throughput screening methods to identify biomarkers and variants, which help early diagnosis and treatments. OBJECTIVE The primary purpose of this study is to develop a classification model using machine learning for predicting the deleterious effect of variants with respect to AD. METHODS We have constructed a set of 20,401 deleterious and 37,452 control variants from Genome-Wide Association Study (GWAS) and Genotype-Tissue Expression (GTEx) portals, respectively. Recursive feature elimination using cross-validation (RFECV) followed by a forward feature selection method was utilized to select the important features and a random forest classifier was used for distinguishing between deleterious and neutral variants. RESULTS Our method showed an accuracy of 81.21% on 10-fold cross-validation and 70.63% on a test set of 5785 variants. The same test set was used to compare the performance of CADD and FATHMM and their accuracies are in the range of 54%-62%. CONCLUSION Our model is freely available as the Variant Effect Predictor for Alzheimer's Disease (VEPAD) at http://web.iitm.ac.in/bioinfo2/vepad/. VEPAD can be used to predict the effect of new variants associated with AD.
Collapse
Affiliation(s)
- Uday Rangaswamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, India
| | - S Akila Parvathy Dharshini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, India
| | - Dhanusha Yesudhas
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, India; School of Computing, Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative Research, Tokyo Institute of Technology, Midori-ku, Kanagawa, 226-8503, Yokohama, Japan.
| |
Collapse
|
17
|
Amin V, Ağaç D, Barnes SD, Çobanoğlu MC. Accurate differential analysis of transcription factor activity from gene expression. Bioinformatics 2020; 35:5018-5029. [PMID: 31099391 DOI: 10.1093/bioinformatics/btz398] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Revised: 04/02/2019] [Accepted: 05/08/2019] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Activity of transcriptional regulators is crucial in elucidating the mechanism of phenotypes. However regulatory activity hypotheses are difficult to experimentally test. Therefore, we need accurate and reliable computational methods for regulator activity inference. There is extensive work in this area, however, current methods have difficulty with one or more of the following: resolving activity of TFs with overlapping regulons, reflecting known regulatory relationships, or flexible modeling of TF activity over the regulon. RESULTS We present Effector and Perturbation Estimation Engine (EPEE), a method for differential analysis of transcription factor (TF) activity from gene expression data. EPEE addresses each of these principal challenges in the field. Firstly, EPEE collectively models all TF activity in a single multivariate model, thereby accounting for the intrinsic coupling among TFs that share targets, which is highly frequent. Secondly, EPEE incorporates context-specific TF-gene regulatory networks and therefore adapts the analysis to each biological context. Finally, EPEE can flexibly reflect different regulatory activity of a single TF among its potential targets. This allows the flexibility to implicitly recover other regulatory influences such as co-activators or repressors. We comparatively validated EPEE in 15 datasets from three well-studied contexts, namely immunology, cancer, and hematopoiesis. We show that addressing the aforementioned challenges enable EPEE to outperform alternative methods and reliably produce accurate results. AVAILABILITY AND IMPLEMENTATION https://github.com/Cobanoglu-Lab/EPEE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Viren Amin
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Didem Ağaç
- Department of Immunology, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Spencer D Barnes
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Murat Can Çobanoğlu
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| |
Collapse
|
18
|
Liu E, Zhang ZZ, Cheng X, Liu X, Cheng L. SCNrank: spectral clustering for network-based ranking to reveal potential drug targets and its application in pancreatic ductal adenocarcinoma. BMC Med Genomics 2020; 13:50. [PMID: 32241274 PMCID: PMC7119297 DOI: 10.1186/s12920-020-0681-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Background Pancreatic ductal adenocarcinoma (PDAC) is the most common pancreatic malignancy. Due to its wide heterogeneity, PDAC acts aggressively and responds poorly to most chemotherapies, causing an urgent need for the development of new therapeutic strategies. Cell lines have been used as the foundation for drug development and disease modeling. CRISPR-Cas9 plays a key role in every step-in drug discovery: from target identification and validation to preclinical cancer cell testing. Using cell-line models and CRISPR-Cas9 technology together make drug target prediction feasible. However, there is still a large gap between predicted results and actionable targets in real tumors. Biological network models provide great modus to mimic genetic interactions in real biological systems, which can benefit gene perturbation studies and potential target identification for treating PDAC. Nevertheless, building a network model that takes cell-line data and CRISPR-Cas9 data as input to accurately predict potential targets that will respond well on real tissue remains unsolved. Methods We developed a novel algorithm ‘Spectral Clustering for Network-based target Ranking’ (SCNrank) that systematically integrates three types of data: expression profiles from tumor tissue, normal tissue and cell-line PDAC; protein-protein interaction network (PPI); and CRISPR-Cas9 data to prioritize potential drug targets for PDAC. The whole algorithm can be classified into three steps: 1. using STRING PPI network skeleton, SCNrank constructs tissue-specific networks with PDAC tumor and normal pancreas tissues from expression profiles; 2. With the same network skeleton, SCNrank constructs cell-line-specific networks using the cell-line PDAC expression profiles and CRISPR-Cas 9 data from pancreatic cancer cell-lines; 3. SCNrank applies a novel spectral clustering approach to reduce data dimension and generate gene clusters that carry common features from both networks. Finally, SCNrank applies a scoring scheme called ‘Target Influence score’ (TI), which estimates a given target’s influence towards the cluster it belongs to, for scoring and ranking each drug target. Results We applied SCNrank to analyze 263 expression profiles, CRPSPR-Cas9 data from 22 different pancreatic cancer cell-lines and the STRING protein-protein interaction (PPI) network. With SCNrank, we successfully constructed an integrated tissue PDAC network and an integrated cell-line PDAC network, both of which contain 4414 selected genes that are overexpressed in tumor tissue samples. After clustering, 4414 genes are distributed into 198 clusters, which include 367 targets of FDA approved drugs. These drug targets are all scored and ranked by their TI scores, which we defined to measure their influence towards the network. We validated top-ranked targets in three aspects: Firstly, mapping them onto the existing clinical drug targets of PDAC to measure the concordance. Secondly, we performed enrichment analysis to these drug targets and the clusters there are within, to reveal functional associations between clusters and PDAC; Thirdly, we performed survival analysis for the top-ranked targets to connect targets with clinical outcomes. Survival analysis reveals that overexpression of three top-ranked genes, PGK1, HMMR and POLE2, significantly increases the risk of death in PDAC patients. Conclusion SCNrank is an unbiased algorithm that systematically integrates multiple types of omics data to do potential drug target selection and ranking. SCNrank shows great capability in predicting drug targets for PDAC. Pancreatic cancer-associated gene candidates predicted by our SCNrank approach have the potential to guide genetics-based anti-pancreatic drug discovery.
Collapse
Affiliation(s)
- Enze Liu
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University-Purdue University, Indianapolis, IN, 46202, USA
| | - Zhuang Zhuang Zhang
- Department of Toxicology and Cancer Biology, College of Medicine, University of Kentucky, Lexington, KY, 40536, USA
| | - Xiaolin Cheng
- College of Pharmacy, Division of Medicinal Chemistry and Pharmacognosy, the Ohio State University, Columbus, OH, 43210, USA
| | - Xiaoqi Liu
- Department of Toxicology and Cancer Biology, College of Medicine, University of Kentucky, Lexington, KY, 40536, USA.
| | - Lijun Cheng
- Department of Biomedical informatics, College of medicine, the Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
19
|
Yuan R, Ou-Yang L, Hu X, Zhang XF. Identifying Gene Network Rewiring Using Robust Differential Graphical Model with Multivariate t-Distribution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:712-718. [PMID: 30802872 DOI: 10.1109/tcbb.2019.2901473] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying gene network rewiring under different biological conditions is important for understanding the mechanisms underlying complex diseases. Gaussian graphical models, which assume the data follow the multivariate normal distribution, are widely used to identify gene network rewiring. However, the normality assume often fails in reality since the data are contaminated by extreme outliers in general. In this study, we propose a new robust differential graphical model to identify gene network rewiring between two conditions based on the multivariate t-distribution. The multivariate t-distribution is more robust to outliers than the normal distribution since it has heavy tails and allows values far from the mean. A fused lasso penalty is used to borrow information across conditions to improve the results. We develop an expectation maximization algorithm to solve the optimization model. Experiment results on simulated data show that our method outperforms the state-of-the-art methods. Our method is also applied to identify gene network rewiring between luminal A and basal-like subtypes of breast cancer, and gene network rewiring between the proneural and mesenchymal subtypes of glioblastoma. Several key genes which drive gene network rewiring are discovered.
Collapse
|
20
|
Salviato E, Djordjilović V, Chiogna M, Romualdi C. SourceSet: A graphical model approach to identify primary genes in perturbed biological pathways. PLoS Comput Biol 2019; 15:e1007357. [PMID: 31652275 PMCID: PMC6834292 DOI: 10.1371/journal.pcbi.1007357] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 11/06/2019] [Accepted: 08/23/2019] [Indexed: 11/24/2022] Open
Abstract
Topological gene-set analysis has emerged as a powerful means for omic data interpretation. Although numerous methods for identifying dysregulated genes have been proposed, few of them aim to distinguish genes that are the real source of perturbation from those that merely respond to the signal dysregulation. Here, we propose a new method, called SourceSet, able to distinguish between the primary and the secondary dysregulation within a Gaussian graphical model context. The proposed method compares gene expression profiles in the control and in the perturbed condition and detects the differences in both the mean and the covariance parameters with a series of likelihood ratio tests. The resulting evidence is used to infer the primary and the secondary set, i.e. the genes responsible for the primary dysregulation, and the genes affected by the perturbation through network propagation. The proposed method demonstrates high specificity and sensitivity in different simulated scenarios and on several real biological case studies. In order to fit into the more traditional pathway analysis framework, SourceSet R package also extends the analysis from a single to multiple pathways and provides several graphical outputs, including Cytoscape visualization to browse the results. The rapid increase in omic studies has created a need to understand the biological implications of their results. Gene-set analysis has emerged as a powerful means for gaining such understanding, evolving in the last decade from the classical enrichment analysis to the more powerful topological approaches. Although numerous methods for identifying dysregulated genes have been proposed, few of them aim to distinguish genes that are the real source of perturbation from those that merely respond to the signal dysregulation. This distinction is crucial for network medicine, where the prioritization of the effect of biological perturbations may help in the molecular understanding of drug treatments and diseases. Here we propose a new method, called SourceSet, able to distinguish between primary and secondary dysregulation within a graphical model context, demonstrating a high specificity and sensitivity in different simulated scenarios and on real biological case studies.
Collapse
Affiliation(s)
- Elisa Salviato
- IFOM - The FIRC Institute of Molecular Oncology, Milan, Italy
- * E-mail: (ES); (CR)
| | | | - Monica Chiogna
- Department of Statistical Sciences, University of Bologna, Bologna, Italy
| | - Chiara Romualdi
- Department of Biology, University of Padova, Padova, Italy
- * E-mail: (ES); (CR)
| |
Collapse
|
21
|
Mukherjee S, Perumal TM, Daily K, Sieberts SK, Omberg L, Preuss C, Carter GW, Mangravite LM, Logsdon BA. Identifying and ranking potential driver genes of Alzheimer's disease using multiview evidence aggregation. Bioinformatics 2019; 35:i568-i576. [PMID: 31510680 PMCID: PMC6612835 DOI: 10.1093/bioinformatics/btz365] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION Late onset Alzheimer's disease is currently a disease with no known effective treatment options. To better understand disease, new multi-omic data-sets have recently been generated with the goal of identifying molecular causes of disease. However, most analytic studies using these datasets focus on uni-modal analysis of the data. Here, we propose a data driven approach to integrate multiple data types and analytic outcomes to aggregate evidences to support the hypothesis that a gene is a genetic driver of the disease. The main algorithmic contributions of our article are: (i) a general machine learning framework to learn the key characteristics of a few known driver genes from multiple feature sets and identifying other potential driver genes which have similar feature representations, and (ii) A flexible ranking scheme with the ability to integrate external validation in the form of Genome Wide Association Study summary statistics. While we currently focus on demonstrating the effectiveness of the approach using different analytic outcomes from RNA-Seq studies, this method is easily generalizable to other data modalities and analysis types. RESULTS We demonstrate the utility of our machine learning algorithm on two benchmark multiview datasets by significantly outperforming the baseline approaches in predicting missing labels. We then use the algorithm to predict and rank potential drivers of Alzheimer's. We show that our ranked genes show a significant enrichment for single nucleotide polymorphisms associated with Alzheimer's and are enriched in pathways that have been previously associated with the disease. AVAILABILITY AND IMPLEMENTATION Source code and link to all feature sets is available at https://github.com/Sage-Bionetworks/EvidenceAggregatedDriverRanking.
Collapse
Affiliation(s)
| | | | | | | | | | - Christoph Preuss
- The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, USA
| | - Gregory W Carter
- The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, USA
| | | | - Benjamin A Logsdon
- Sage Bionetworks, Seattle, WA, USA,To whom correspondence should be addressed.
| |
Collapse
|
22
|
Xu T, Ou-Yang L, Hu X, Zhang XF. Identifying Gene Network Rewiring by Integrating Gene Expression and Gene Network Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:2079-2085. [PMID: 29994068 DOI: 10.1109/tcbb.2018.2809603] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Exploring the rewiring pattern of gene regulatory networks between different pathological states is an important task in bioinformatics. Although a number of computational approaches have been developed to infer differential networks from high-throughput data, most of them only focus on gene expression data. The valuable static gene regulatory network data accumulated in recent biomedical researches are neglected. In this study, we propose a new Gaussian graphical model-based method to infer differential networks by integrating gene expression and static gene regulatory network data. We first evaluate the empirical performance of our method by comparing with the state-of-the-art methods using simulation data. We also apply our method to The Cancer Genome Atlas data to identify gene network rewiring between ovarian cancers with different platinum responses, and rewiring between breast cancers of luminal A subtype and basal-like subtype. Hub genes in the estimated differential networks rediscover known genes associated with platinum resistance in ovarian cancer and signatures of the breast cancer intrinsic subtypes.
Collapse
|
23
|
McGillivray P, Clarke D, Meyerson W, Zhang J, Lee D, Gu M, Kumar S, Zhou H, Gerstein M. Network Analysis as a Grand Unifier in Biomedical Data Science. Annu Rev Biomed Data Sci 2018. [DOI: 10.1146/annurev-biodatasci-080917-013444] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Biomedical data scientists study many types of networks, ranging from those formed by neurons to those created by molecular interactions. People often criticize these networks as uninterpretable diagrams termed hairballs; however, here we show that molecular biological networks can be interpreted in several straightforward ways. First, we can break down a network into smaller components, focusing on individual pathways and modules. Second, we can compute global statistics describing the network as a whole. Third, we can compare networks. These comparisons can be within the same context (e.g., between two gene regulatory networks) or cross-disciplinary (e.g., between regulatory networks and governmental hierarchies). The latter comparisons can transfer a formalism, such as that for Markov chains, from one context to another or relate our intuitions in a familiar setting (e.g., social networks) to the relatively unfamiliar molecular context. Finally, key aspects of molecular networks are dynamics and evolution, i.e., how they evolve over time and how genetic variants affect them. By studying the relationships between variants in networks, we can begin to interpret many common diseases, such as cancer and heart disease.
Collapse
Affiliation(s)
- Patrick McGillivray
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Declan Clarke
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - William Meyerson
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
| | - Jing Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
| | - Donghoon Lee
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
| | - Mengting Gu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA
| | - Sushant Kumar
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Holly Zhou
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA
| |
Collapse
|
24
|
Mukherjee S, Carignano A, Seelig G, Lee SI. Identifying progressive gene network perturbation from single-cell RNA-seq data. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2018:5034-5040. [PMID: 30441472 DOI: 10.1109/embc.2018.8513444] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying the gene regulatory networks that control development or disease is one of the most important problems in biology. Here, we introduce a computational approach, called PIPER (ProgressIve network PERturbation), to identify the perturbed genes that drive differences in the gene regulatory network across different points in a biological progression. PIPER employs algorithms tailor-made for single cell RNA sequencing (scRNA-seq) data to jointly identify gene networks for multiple progressive conditions. It then performs differential network analysis along the identified gene networks to identify master regulators. We demonstrate that PIPER outperforms state-of-the-art alternative methods on simulated data and is able to predict known key regulators of differentiation on real scRNA-Seq datasets.
Collapse
|
25
|
Abstract
Wnt signaling is important for breast development and remodeling during pregnancy and lactation. Epigenetic modifications change expression levels of components of the Wnt pathway, underlying oncogenic transformation. However, no clear Wnt component increasing expression universally across breast cancer (BC) or its most Wnt-dependent triple-negative BC (TNBC) subgroup has been identified, delaying development of targeted therapies. Here we perform network correlation analysis of expression of >100 Wnt pathway components in hundreds of healthy and cancerous breast tissues. Varying in expression levels among people, Wnt components remarkably coordinate their production; this coordination is dramatically decreased in BC. Clusters with coordinated gene expression exist within the healthy cohort, highlighting Wnt signaling subtypes. Different BC subgroups are identified, characterized by different remaining Wnt signaling signatures, providing the rational for patient stratification for personalizing the therapeutic applications. Key pairwise interactions within the Wnt pathway (some inherited and some established de novo) emerge as targets for future drug discovery against BC.
Collapse
|
26
|
Tu JJ, Ou-Yang L, Hu X, Zhang XF. Identifying gene network rewiring by combining gene expression and gene mutation data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:1042-1048. [PMID: 29993891 DOI: 10.1109/tcbb.2018.2834529] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Understanding how gene dependency networks rewire between different disease states is an important task in genomic research. Although many computational methods have been proposed to undertake this task via differential network analysis, most of them are designed for a predefined data type. With the development of the high throughput technologies, gene activity measurements can be collected from different aspects (e.g., mRNA expression and DNA mutation). Different data types might share some common characteristics and include certain unique properties. New methods are needed to explore the similarity and difference between differential networks estimated from different data types. In this study, we develop a new differential network inference model which identifies gene network rewiring by combining gene expression and gene mutation data. Similarity and difference between different data types are learned via a group bridge penalty function. Simulation studies have demonstrated that our method consistently outperforms the competing methods. We also apply our method to identify gene network rewiring associated with ovarian cancer platinum resistance. There are certain differential edges common to both data types and some differential edges unique to individual data types. Hub genes in the differential networks inferred by our method play important roles in ovarian cancer drug resistance.
Collapse
|
27
|
Singh AJ, Ramsey SA, Filtz TM, Kioussi C. Differential gene regulatory networks in development and disease. Cell Mol Life Sci 2018; 75:1013-1025. [PMID: 29018868 PMCID: PMC11105524 DOI: 10.1007/s00018-017-2679-6] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Revised: 09/19/2017] [Accepted: 10/04/2017] [Indexed: 02/02/2023]
Abstract
Gene regulatory networks, in which differential expression of regulator genes induce differential expression of their target genes, underlie diverse biological processes such as embryonic development, organ formation and disease pathogenesis. An archetypical systems biology approach to mapping these networks involves the combined application of (1) high-throughput sequencing-based transcriptome profiling (RNA-seq) of biopsies under diverse network perturbations and (2) network inference based on gene-gene expression correlation analysis. The comparative analysis of such correlation networks across cell types or states, differential correlation network analysis, can identify specific molecular signatures and functional modules that underlie the state transition or have context-specific function. Here, we review the basic concepts of network biology and correlation network inference, and the prevailing methods for differential analysis of correlation networks. We discuss applications of gene expression network analysis in the context of embryonic development, cancer, and congenital diseases.
Collapse
Affiliation(s)
- Arun J Singh
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, 97331, USA
| | - Stephen A Ramsey
- Department of Biomedical Sciences, College of Veterinary Medicine, Oregon State University, Corvallis, OR, 97331, USA
- School of Electrical Engineering and Computer Science, College of Engineering, Oregon State University, Corvallis, OR, 97331, USA
| | - Theresa M Filtz
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, 97331, USA
| | - Chrissa Kioussi
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, 97331, USA.
| |
Collapse
|
28
|
Wang J, Li Z, Lei M, Fu Y, Zhao J, Ao M, Xu L. Integrated DNA methylome and transcriptome analysis reveals the ethylene-induced flowering pathway genes in pineapple. Sci Rep 2017; 7:17167. [PMID: 29215068 PMCID: PMC5719354 DOI: 10.1038/s41598-017-17460-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 11/27/2017] [Indexed: 01/09/2023] Open
Abstract
Ethylene has long been used to promote flowering in pineapple production. Ethylene-induced flowering is dose dependent, with a critical threshold level of ethylene response factors needed to trigger flowering. The mechanism of ethylene-induced flowering is still unclear. Here, we integrated isoform sequencing (iso-seq), Illumina short-reads sequencing and whole-genome bisulfite sequencing (WGBS) to explore the early changes of transcriptomic and DNA methylation in pineapple following high-concentration ethylene (HE) and low-concentration ethylene (LE) treatment. Iso-seq produced 122,338 transcripts, including 26,893 alternative splicing isoforms, 8,090 novel transcripts and 12,536 candidate long non-coding RNAs. The WGBS results suggested a decrease in CG methylation and increase in CHH methylation following HE treatment. The LE and HE treatments induced drastic changes in transcriptome and DNA methylome, with LE inducing the initial response to flower induction and HE inducing the subsequent response. The dose-dependent induction of FLOWERING LOCUS T-like genes (FTLs) may have contributed to dose-dependent flowering induction in pineapple by ethylene. Alterations in DNA methylation, lncRNAs and multiple genes may be involved in the regulation of FTLs. Our data provided a landscape of the transcriptome and DNA methylome and revealed a candidate network that regulates flowering time in pineapple, which may promote further studies.
Collapse
Affiliation(s)
- Jiabin Wang
- Institute of Tropical Crop Genetic Resources, Chinese Academy of Tropical Agricultural Sciences, Danzhou, 571737, Hainan, China.,Ministry of Agriculture Key Laboratory of Crop Gene Resources and Germplasm Enhancement in Southern China, Danzhou, 571737, Hainan, China.,Hainan Province Key Laboratory of Tropical Crops Germplasm Resources Genetic Improvement and Innovation, Danzhou, 571737, Hainan, China
| | - Zhiying Li
- Institute of Tropical Crop Genetic Resources, Chinese Academy of Tropical Agricultural Sciences, Danzhou, 571737, Hainan, China.,Ministry of Agriculture Key Laboratory of Crop Gene Resources and Germplasm Enhancement in Southern China, Danzhou, 571737, Hainan, China.,Hainan Province Key Laboratory of Tropical Crops Germplasm Resources Genetic Improvement and Innovation, Danzhou, 571737, Hainan, China
| | - Ming Lei
- Institute of Tropical Crop Genetic Resources, Chinese Academy of Tropical Agricultural Sciences, Danzhou, 571737, Hainan, China.,Ministry of Agriculture Key Laboratory of Crop Gene Resources and Germplasm Enhancement in Southern China, Danzhou, 571737, Hainan, China.,Hainan Province Key Laboratory of Tropical Crops Germplasm Resources Genetic Improvement and Innovation, Danzhou, 571737, Hainan, China
| | - Yunliu Fu
- Institute of Tropical Crop Genetic Resources, Chinese Academy of Tropical Agricultural Sciences, Danzhou, 571737, Hainan, China.,Ministry of Agriculture Key Laboratory of Crop Gene Resources and Germplasm Enhancement in Southern China, Danzhou, 571737, Hainan, China.,Hainan Province Key Laboratory of Tropical Crops Germplasm Resources Genetic Improvement and Innovation, Danzhou, 571737, Hainan, China
| | - Jiaju Zhao
- Institute of Tropical Crop Genetic Resources, Chinese Academy of Tropical Agricultural Sciences, Danzhou, 571737, Hainan, China.,Ministry of Agriculture Key Laboratory of Crop Gene Resources and Germplasm Enhancement in Southern China, Danzhou, 571737, Hainan, China.,Hainan Province Key Laboratory of Tropical Crops Germplasm Resources Genetic Improvement and Innovation, Danzhou, 571737, Hainan, China
| | - Mengfei Ao
- Institute of Tropical Crop Genetic Resources, Chinese Academy of Tropical Agricultural Sciences, Danzhou, 571737, Hainan, China.,Ministry of Agriculture Key Laboratory of Crop Gene Resources and Germplasm Enhancement in Southern China, Danzhou, 571737, Hainan, China.,Hainan Province Key Laboratory of Tropical Crops Germplasm Resources Genetic Improvement and Innovation, Danzhou, 571737, Hainan, China
| | - Li Xu
- Institute of Tropical Crop Genetic Resources, Chinese Academy of Tropical Agricultural Sciences, Danzhou, 571737, Hainan, China. .,Ministry of Agriculture Key Laboratory of Crop Gene Resources and Germplasm Enhancement in Southern China, Danzhou, 571737, Hainan, China. .,Hainan Province Key Laboratory of Tropical Crops Germplasm Resources Genetic Improvement and Innovation, Danzhou, 571737, Hainan, China.
| |
Collapse
|
29
|
Zhang XF, Ou-Yang L, Yan H. Node-based differential network analysis in genomics. Comput Biol Chem 2017; 69:194-201. [DOI: 10.1016/j.compbiolchem.2017.03.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Accepted: 03/27/2017] [Indexed: 12/26/2022]
|
30
|
Zhang XF, Ou-Yang L, Yan H. Incorporating prior information into differential network analysis using non-paranormal graphical models. Bioinformatics 2017; 33:2436-2445. [DOI: 10.1093/bioinformatics/btx208] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 04/05/2017] [Indexed: 02/02/2023] Open
Affiliation(s)
- Xiao-Fei Zhang
- Department of Statistics, School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, China
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| | - Le Ou-Yang
- Department of Electronic Engineering, College of Information Engineering, Shenzhen University, Shenzhen, China
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| |
Collapse
|