1
|
Li Q, Nichols C, Welner RS, Chen JY, Ku WS, Yue Z. Toden-E: Topology-Based and Density-Based Ensembled Clustering for the Development of Super-PAG in Functional Genomics using PAG Network and LLM. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.20.619308. [PMID: 39484450 PMCID: PMC11526983 DOI: 10.1101/2024.10.20.619308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
The integrative analysis of gene sets, networks, and pathways is pivotal for deciphering omics data in translational biomedical research. To significantly increase gene coverage and enhance the utility of pathways, annotated gene lists, and gene signatures from diverse sources, we introduced pathways, annotated gene lists, and gene signatures (PAGs) enriched with metadata to represent biological functions. Furthermore, we established PAG-PAG networks by leveraging gene member similarity and gene regulations. However, in practice, high similarity in functional descriptions or gene membership often leads to redundant PAGs, hindering the interpretation from a fuzzy enriched PAG list. In this study, we developed todenE (topology-based and density-based ensemble) clustering, pioneering in integrating topology-based and density-based clustering methods to detect PAG communities leveraging the PAG network and Large Language Models (LLM). In computational genomics annotation, the genes can be grouped/clustered through the gene relationships and gene functions via guilt by association. Similarly, PAGs can be grouped into higher-level clusters, forming concise functional representations called Super-PAGs. TodenE captures PAG-PAG similarity and encapsulates functional information through LLM, in characterizing network-based functional Super-PAGs. In synthetic data, we introduced a metric called the Disparity Index (DI), measuring the connectivity of gene neighbors to gauge clusterability. We compared multiple clustering algorithms to identify the best method for generating performance-driven clusters. In non-simulated data (Gene Ontology), by leveraging transfer learning and LLM, we formed a language-based similarity embedding. TodenE utilizes this embedding together with the topology-based embedding to generate putative Super-PAGs with superior performance in semantic and gene member inclusiveness.
Collapse
|
2
|
Al Abir F, Chen JY. Mondrian Abstraction and Language Model Embeddings for Differential Pathway Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.11.589093. [PMID: 38659966 PMCID: PMC11042185 DOI: 10.1101/2024.04.11.589093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
In this study, we introduce the Mondrian Map, an innovative visualization tool inspired by Piet Mondrian's abstract art, to address the complexities inherent in visualizing biological networks. By converting intricate biological data into a structured and intuitive format, the Mondrian Map enables clear and meaningful representations of biological pathways, facilitating a deeper understanding of molecular dynamics. Each pathway is represented by a square whose size corresponds to fold change, with color indicating the direction of regulation (up or down) and statistical significance. The spatial arrangement of pathways is derived from language model embeddings, preserving neighborhood relationships and enabling the identification of clusters of related pathways. Additionally, colored lines highlight potential crosstalk between pathways, with distinctions between short- and long-range functional interactions. In a case study of glioblastoma multiforme (GBM), the Mondrian Map effectively revealed distinct pathway patterns across patient profiles at different stages of disease progression. These insights demonstrate the tool's potential to enhance downstream bioinformatics analysis by providing a more comprehensive and visually accessible overview of pathway interactions, offering new avenues for therapeutic exploration and personalized medicine.
Collapse
Affiliation(s)
- Fuad Al Abir
- Department of Computer Science, University of Alabama at Birmingham, Birmingham, AL 35233, USA
| | - Jake Y Chen
- Systems Pharmacology AI Research Center (SPARC), School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35233, USA
| |
Collapse
|
3
|
Huang F, Welner RS, Chen JY, Yue Z. PAGER-scFGA: unveiling cell functions and molecular mechanisms in cell trajectories through single-cell functional genomics analysis. FRONTIERS IN BIOINFORMATICS 2024; 4:1336135. [PMID: 38690527 PMCID: PMC11058213 DOI: 10.3389/fbinf.2024.1336135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 04/01/2024] [Indexed: 05/02/2024] Open
Abstract
Background: Understanding how cells and tissues respond to stress factors and perturbations during disease processes is crucial for developing effective prevention, diagnosis, and treatment strategies. Single-cell RNA sequencing (scRNA-seq) enables high-resolution identification of cells and exploration of cell heterogeneity, shedding light on cell differentiation/maturation and functional differences. Recent advancements in multimodal sequencing technologies have focused on improving access to cell-specific subgroups for functional genomics analysis. To facilitate the functional annotation of cell groups and characterization of molecular mechanisms underlying cell trajectories, we introduce the Pathways, Annotated Gene Lists, and Gene Signatures Electronic Repository for Single-Cell Functional Genomics Analysis (PAGER-scFGA). Results: We have developed PAGER-scFGA, which integrates cell functional annotations and gene-set enrichment analysis into popular single-cell analysis pipelines such as Scanpy. Using differentially expressed genes (DEGs) from pairwise cell clusters, PAGER-scFGA infers cell functions through the enrichment of potential cell-marker genesets. Moreover, PAGER-scFGA provides pathways, annotated gene lists, and gene signatures (PAGs) enriched in specific cell subsets with tissue compositions and continuous transitions along cell trajectories. Additionally, PAGER-scFGA enables the construction of a gene subcellular map based on DEGs and allows examination of the gene functional compartments (GFCs) underlying cell maturation/differentiation. In a real-world case study of mouse natural killer (mNK) cells, PAGER-scFGA revealed two major stages of natural killer (NK) cells and three trajectories from the precursor stage to NK T-like mature stage within blood, spleen, and bone marrow tissues. As the trajectories progress to later stages, the DEGs exhibit greater divergence and variability. However, the DEGs in different trajectories still interact within a network during NK cell maturation. Notably, PAGER-scFGA unveiled cell cytotoxicity, exocytosis, and the response to interleukin (IL) signaling pathways and associated network models during the progression from precursor NK cells to mature NK cells. Conclusion: PAGER-scFGA enables in-depth exploration of functional insights and presents a comprehensive knowledge map of gene networks and GFCs, which can be utilized for future studies and hypothesis generation. It is expected to become an indispensable tool for inferring cell functions and detecting molecular mechanisms within cell trajectories in single-cell studies. The web app (accessible at https://au-singlecell.streamlit.app/) is publicly available.
Collapse
Affiliation(s)
- Fengyuan Huang
- Department of Biomedical Informatics and Data Science, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Robert S. Welner
- Hematology & Oncology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jake Y. Chen
- Department of Biomedical Informatics and Data Science, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Zongliang Yue
- Health Outcome Research and Policy Department, Harrison College of Pharmacy, Auburn University, Auburn, AL, United States
| |
Collapse
|
4
|
Saghapour E, Yue Z, Sharma R, Kumar S, Sembay Z, Willey CD, Chen JY. Explorative Discovery of Gene Signatures and Clinotypes in Glioblastoma Cancer Through GeneTerrain Knowledge Map Representation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.01.587278. [PMID: 38617348 PMCID: PMC11014492 DOI: 10.1101/2024.04.01.587278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
This study introduces the GeneTerrain Knowledge Map Representation (GTKM), a novel method for visualizing gene expression data in cancer research. GTKM leverages protein-protein interactions to graphically display differentially expressed genes (DEGs) on a 2-dimensional contour plot, offering a more nuanced understanding of gene interactions and expression patterns compared to traditional heatmap methods. The research demonstrates GTKM's utility through four case studies on glioblastoma (GBM) datasets, focusing on survival analysis, subtype identification, IDH1 mutation analysis, and drug sensitivities of different tumor cell lines. Additionally, a prototype website has been developed to showcase these findings, indicating the method's adaptability for various cancer types. The study reveals that GTKM effectively identifies gene patterns associated with different clinical outcomes in GBM, and its profiles enable the identification of sub-gene signature patterns crucial for predicting survival. The methodology promises significant advancements in precision medicine, providing a powerful tool for understanding complex gene interactions and identifying potential therapeutic targets in cancer treatment.
Collapse
Affiliation(s)
- Ehsan Saghapour
- Department of Biomedical Informatics and Data Science, University of Alabama at Birmingham, Birmingham, AL, US
| | - Zongliang Yue
- Health Outcome Research and Policy Department, Harrison College of Pharmacy, Auburn University, AL, US
| | - Rahul Sharma
- Department of Biomedical Informatics and Data Science, University of Alabama at Birmingham, Birmingham, AL, US
| | - Sidharth Kumar
- Department of Radiation Oncology, The University of Alabama at Birmingham, Birmingham, AL, US
| | - Zhandos Sembay
- Department of Biomedical Informatics and Data Science, University of Alabama at Birmingham, Birmingham, AL, US
| | - Christopher D Willey
- Department of Radiation Oncology, The University of Alabama at Birmingham, Birmingham, AL, US
| | - Jake Y Chen
- Department of Biomedical Informatics and Data Science, University of Alabama at Birmingham, Birmingham, AL, US
- Systems Pharmacology AI Research Center, University of Alabama at Birmingham, AL, US
| |
Collapse
|
5
|
Nguyen T, Wei Y, Nakada Y, Chen JY, Zhou Y, Walcott G, Zhang J. Analysis of cardiac single-cell RNA-sequencing data can be improved by the use of artificial-intelligence-based tools. Sci Rep 2023; 13:6821. [PMID: 37100826 PMCID: PMC10133286 DOI: 10.1038/s41598-023-32293-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 03/25/2023] [Indexed: 04/28/2023] Open
Abstract
Single-cell RNA sequencing (scRNAseq) enables researchers to identify and characterize populations and subpopulations of different cell types in hearts recovering from myocardial infarction (MI) by characterizing the transcriptomes in thousands of individual cells. However, the effectiveness of the currently available tools for processing and interpreting these immense datasets is limited. We incorporated three Artificial Intelligence (AI) techniques into a toolkit for evaluating scRNAseq data: AI Autoencoding separates data from different cell types and subpopulations of cell types (cluster analysis); AI Sparse Modeling identifies genes and signaling mechanisms that are differentially activated between subpopulations (pathway/gene set enrichment analysis), and AI Semisupervised Learning tracks the transformation of cells from one subpopulation into another (trajectory analysis). Autoencoding was often used in data denoising; yet, in our pipeline, Autoencoding was exclusively used for cell embedding and clustering. The performance of our AI scRNAseq toolkit and other highly cited non-AI tools was evaluated with three scRNAseq datasets obtained from the Gene Expression Omnibus database. Autoencoder was the only tool to identify differences between the cardiomyocyte subpopulations found in mice that underwent MI or sham-MI surgery on postnatal day (P) 1. Statistically significant differences between cardiomyocytes from P1-MI mice and mice that underwent MI on P8 were identified for six cell-cycle phases and five signaling pathways when the data were analyzed via Sparse Modeling, compared to just one cell-cycle phase and one pathway when the data were analyzed with non-AI techniques. Only Semisupervised Learning detected trajectories between the predominant cardiomyocyte clusters in hearts collected on P28 from pigs that underwent apical resection (AR) on P1, and on P30 from pigs that underwent AR on P1 and MI on P28. In another dataset, the pig scRNAseq data were collected after the injection of CCND2-overexpression Human-induced Pluripotent Stem Cell-derived cardiomyocytes (CCND2hiPSC) into injured P28 pig heart; only the AI-based technique could demonstrate that the host cardiomyocytes increase proliferating by through the HIPPO/YAP and MAPK signaling pathways. For the cluster, pathway/gene set enrichment, and trajectory analysis of scRNAseq datasets generated from studies of myocardial regeneration in mice and pigs, our AI-based toolkit identified results that non-AI techniques did not discover. These different results were validated and were important in explaining myocardial regeneration.
Collapse
Affiliation(s)
- Thanh Nguyen
- Department of Biomedical Engineering, University of Alabama at Birmingham, Birmingham, AL, 35233, USA
| | - Yuhua Wei
- Department of Biomedical Engineering, University of Alabama at Birmingham, Birmingham, AL, 35233, USA
| | - Yuji Nakada
- Department of Biomedical Engineering, University of Alabama at Birmingham, Birmingham, AL, 35233, USA
| | - Jake Y Chen
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35233, USA
| | - Yang Zhou
- Department of Biomedical Engineering, University of Alabama at Birmingham, Birmingham, AL, 35233, USA
| | - Gregory Walcott
- Department of Medicine, Cardiovascular Diseases, University of Alabama at Birmingham, Birmingham, AL, 35233, USA
| | - Jianyi Zhang
- Department of Biomedical Engineering, University of Alabama at Birmingham, Birmingham, AL, 35233, USA.
- Department of Medicine, Cardiovascular Diseases, University of Alabama at Birmingham, Birmingham, AL, 35233, USA.
- Department of Biomedical Engineering, School of Medicine and School of Engineering, University of Alabama at Birmingham, 1670 University Blvd, Volker Hall G094J, Birmingham, AL, 35233, USA.
| |
Collapse
|
6
|
Slominski AT, Slominski RM, Raman C, Chen JY, Athar M, Elmets C. Neuroendocrine signaling in the skin with a special focus on the epidermal neuropeptides. Am J Physiol Cell Physiol 2022; 323:C1757-C1776. [PMID: 36317800 PMCID: PMC9744652 DOI: 10.1152/ajpcell.00147.2022] [Citation(s) in RCA: 101] [Impact Index Per Article: 33.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 10/21/2022] [Accepted: 10/21/2022] [Indexed: 11/07/2022]
Abstract
The skin, which is comprised of the epidermis, dermis, and subcutaneous tissue, is the largest organ in the human body and it plays a crucial role in the regulation of the body's homeostasis. These functions are regulated by local neuroendocrine and immune systems with a plethora of signaling molecules produced by resident and immune cells. In addition, neurotransmitters, endocrine factors, neuropeptides, and cytokines released from nerve endings play a central role in the skin's responses to stress. These molecules act on the corresponding receptors in an intra-, juxta-, para-, or autocrine fashion. The epidermis as the outer most component of skin forms a barrier directly protecting against environmental stressors. This protection is assured by an intrinsic keratinocyte differentiation program, pigmentary system, and local nervous, immune, endocrine, and microbiome elements. These constituents communicate cross-functionally among themselves and with corresponding systems in the dermis and hypodermis to secure the basic epidermal functions to maintain local (skin) and global (systemic) homeostasis. The neurohormonal mediators and cytokines used in these communications regulate physiological skin functions separately or in concert. Disturbances in the functions in these systems lead to cutaneous pathology that includes inflammatory (i.e., psoriasis, allergic, or atopic dermatitis, etc.) and keratinocytic hyperproliferative disorders (i.e., seborrheic and solar keratoses), dysfunction of adnexal structure (i.e., hair follicles, eccrine, and sebaceous glands), hypersensitivity reactions, pigmentary disorders (vitiligo, melasma, and hypo- or hyperpigmentary responses), premature aging, and malignancies (melanoma and nonmelanoma skin cancers). These cellular, molecular, and neural components preserve skin integrity and protect against skin pathologies and can act as "messengers of the skin" to the central organs, all to preserve organismal survival.
Collapse
Affiliation(s)
- Andrzej T Slominski
- Department of Dermatology, University of Alabama at Birmingham, Birmingham, Alabama
- Comprehensive Cancer Center, Cancer Chemoprevention Program, University of Alabama at Birmingham, Birmingham, Alabama
- VA Medical Center, Birmingham, Alabama
| | - Radomir M Slominski
- Graduate Biomedical Sciences Program, University of Alabama at Birmingham, Birmingham, Alabama
| | - Chander Raman
- Department of Dermatology, University of Alabama at Birmingham, Birmingham, Alabama
| | - Jake Y Chen
- Informatics Institute, University of Alabama at Birmingham, Birmingham, Alabama
| | - Mohammad Athar
- Department of Dermatology, University of Alabama at Birmingham, Birmingham, Alabama
- VA Medical Center, Birmingham, Alabama
| | - Craig Elmets
- Department of Dermatology, University of Alabama at Birmingham, Birmingham, Alabama
- Comprehensive Cancer Center, Cancer Chemoprevention Program, University of Alabama at Birmingham, Birmingham, Alabama
- VA Medical Center, Birmingham, Alabama
| |
Collapse
|
7
|
Nguyen T, Yue Z, Slominski R, Welner R, Zhang J, Chen JY. WINNER: A network biology tool for biomolecular characterization and prioritization. Front Big Data 2022; 5:1016606. [PMID: 36407327 PMCID: PMC9672476 DOI: 10.3389/fdata.2022.1016606] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/14/2022] [Indexed: 12/09/2024] Open
Abstract
BACKGROUND AND CONTRIBUTION In network biology, molecular functions can be characterized by network-based inference, or "guilt-by-associations." PageRank-like tools have been applied in the study of biomolecular interaction networks to obtain further the relative significance of all molecules in the network. However, there is a great deal of inherent noise in widely accessible data sets for gene-to-gene associations or protein-protein interactions. How to develop robust tests to expand, filter, and rank molecular entities in disease-specific networks remains an ad hoc data analysis process. RESULTS We describe a new biomolecular characterization and prioritization tool called Weighted In-Network Node Expansion and Ranking (WINNER). It takes the input of any molecular interaction network data and generates an optionally expanded network with all the nodes ranked according to their relevance to one another in the network. To help users assess the robustness of results, WINNER provides two different types of statistics. The first type is a node-expansion p-value, which helps evaluate the statistical significance of adding "non-seed" molecules to the original biomolecular interaction network consisting of "seed" molecules and molecular interactions. The second type is a node-ranking p-value, which helps evaluate the relative statistical significance of the contribution of each node to the overall network architecture. We validated the robustness of WINNER in ranking top molecules by spiking noises in several network permutation experiments. We have found that node degree-preservation randomization of the gene network produced normally distributed ranking scores, which outperform those made with other gene network randomization techniques. Furthermore, we validated that a more significant proportion of the WINNER-ranked genes was associated with disease biology than existing methods such as PageRank. We demonstrated the performance of WINNER with a few case studies, including Alzheimer's disease, breast cancer, myocardial infarctions, and Triple negative breast cancer (TNBC). In all these case studies, the expanded and top-ranked genes identified by WINNER reveal disease biology more significantly than those identified by other gene prioritizing software tools, including Ingenuity Pathway Analysis (IPA) and DiAMOND. CONCLUSION WINNER ranking strongly correlates to other ranking methods when the network covers sufficient node and edge information, indicating a high network quality. WINNER users can use this new tool to robustly evaluate a list of candidate genes, proteins, or metabolites produced from high-throughput biology experiments, as long as there is available gene/protein/metabolic network information.
Collapse
Affiliation(s)
- Thanh Nguyen
- Informatics Institute in School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
- Department of Biomedical Engineering, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Zongliang Yue
- Informatics Institute in School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Radomir Slominski
- Informatics Institute in School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Robert Welner
- Comprehensive Arthritis, Musculoskeletal, Bone and Autoimmunity Center (CAMBAC), School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jianyi Zhang
- Department of Biomedical Engineering, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jake Y. Chen
- Informatics Institute in School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| |
Collapse
|
8
|
Stackhouse CT, Anderson JC, Yue Z, Nguyen T, Eustace NJ, Langford CP, Wang J, Rowland JR, Xing C, Mikhail FM, Cui X, Alrefai H, Bash RE, Lee KJ, Yang ES, Hjelmeland AB, Miller CR, Chen JY, Gillespie GY, Willey CD. An in vivo model of glioblastoma radiation resistance identifies long non-coding RNAs and targetable kinases. JCI Insight 2022; 7:148717. [PMID: 35852875 PMCID: PMC9462495 DOI: 10.1172/jci.insight.148717] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 07/07/2022] [Indexed: 12/03/2022] Open
Abstract
Key molecular regulators of acquired radiation resistance in recurrent glioblastoma (GBM) are largely unknown, with a dearth of accurate preclinical models. To address this, we generated 8 GBM patient-derived xenograft (PDX) models of acquired radiation therapy–selected (RTS) resistance compared with same-patient, treatment-naive (radiation-sensitive, unselected; RTU) PDXs. These likely unique models mimic the longitudinal evolution of patient recurrent tumors following serial radiation therapy. Indeed, while whole-exome sequencing showed retention of major genomic alterations in the RTS lines, we did detect a chromosome 12q14 amplification that was associated with clinical GBM recurrence in 2 RTS models. A potentially novel bioinformatics pipeline was applied to analyze phenotypic, transcriptomic, and kinomic alterations, which identified long noncoding RNAs (lncRNAs) and targetable, PDX-specific kinases. We observed differential transcriptional enrichment of DNA damage repair pathways in our RTS models, which correlated with several lncRNAs. Global kinomic profiling separated RTU and RTS models, but pairwise analyses indicated that there are multiple molecular routes to acquired radiation resistance. RTS model–specific kinases were identified and targeted with clinically relevant small molecule inhibitors. This cohort of in vivo RTS patient-derived models will enable future preclinical therapeutic testing to help overcome the treatment resistance seen in patients with GBM.
Collapse
Affiliation(s)
| | | | - Zongliang Yue
- Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA. Birmingham, Alabama, USA
| | - Thanh Nguyen
- Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA. Birmingham, Alabama, USA
| | | | | | - Jelai Wang
- Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA. Birmingham, Alabama, USA
| | - James R. Rowland
- Department of Physics, The Ohio State University, Columbus, Ohio, USA
| | | | - Fady M. Mikhail
- Department of Genetics, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Xiangqin Cui
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
| | | | - Ryan E. Bash
- Division of Neuropathology, Department of Pathology, and
| | | | | | - Anita B. Hjelmeland
- Department of Cell, Developmental, and Integrative Biology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - C. Ryan Miller
- Division of Neuropathology, Department of Pathology, and
| | - Jake Y. Chen
- Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA. Birmingham, Alabama, USA
| | | | | |
Collapse
|
9
|
Weng Z, Yue Z, Zhu Y, Chen JY. DEMA: a distance-bounded energy-field minimization algorithm to model and layout biomolecular networks with quantitative features. Bioinformatics 2022; 38:i359-i368. [PMID: 35758816 PMCID: PMC9235497 DOI: 10.1093/bioinformatics/btac261] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
SUMMARY In biology, graph layout algorithms can reveal comprehensive biological contexts by visually positioning graph nodes in their relevant neighborhoods. A layout software algorithm/engine commonly takes a set of nodes and edges and produces layout coordinates of nodes according to edge constraints. However, current layout engines normally do not consider node, edge or node-set properties during layout and only curate these properties after the layout is created. Here, we propose a new layout algorithm, distance-bounded energy-field minimization algorithm (DEMA), to natively consider various biological factors, i.e., the strength of gene-to-gene association, the gene's relative contribution weight and the functional groups of genes, to enhance the interpretation of complex network graphs. In DEMA, we introduce a parameterized energy model where nodes are repelled by the network topology and attracted by a few biological factors, i.e., interaction coefficient, effect coefficient and fold change of gene expression. We generalize these factors as gene weights, protein-protein interaction weights, gene-to-gene correlations and the gene set annotations-four parameterized functional properties used in DEMA. Moreover, DEMA considers further attraction/repulsion/grouping coefficient to enable different preferences in generating network views. Applying DEMA, we performed two case studies using genetic data in autism spectrum disorder and Alzheimer's disease, respectively, for gene candidate discovery. Furthermore, we implement our algorithm as a plugin to Cytoscape, an open-source software platform for visualizing networks; hence, it is convenient. Our software and demo can be freely accessed at http://discovery.informatics.uab.edu/dema. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhenyu Weng
- Communication and Information Security Lab, Institute of Big Data Technologies, Shenzhen Graduate School, Peking University, Shenzhen 518055, China
| | - Zongliang Yue
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Yuesheng Zhu
- Communication and Information Security Lab, Institute of Big Data Technologies, Shenzhen Graduate School, Peking University, Shenzhen 518055, China
| | - Jake Yue Chen
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| |
Collapse
|
10
|
Yue Z, Slominski R, Bharti S, Chen JY. PAGER Web APP: An Interactive, Online Gene Set and Network Interpretation Tool for Functional Genomics. Front Genet 2022; 13:820361. [PMID: 35495152 PMCID: PMC9039620 DOI: 10.3389/fgene.2022.820361] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 03/17/2022] [Indexed: 12/30/2022] Open
Abstract
Functional genomics studies have helped researchers annotate differentially expressed gene lists, extract gene expression signatures, and identify biological pathways from omics profiling experiments conducted on biological samples. The current geneset, network, and pathway analysis (GNPA) web servers, e.g., DAVID, EnrichR, WebGestaltR, or PAGER, do not allow automated integrative functional genomic downstream analysis. In this study, we developed a new web-based interactive application, "PAGER Web APP", which supports online R scripting of integrative GNPA. In a case study of melanoma drug resistance, we showed that the new PAGER Web APP enabled us to discover highly relevant pathways and network modules, leading to novel biological insights. We also compared PAGER Web APP's pathway analysis results retrieved among PAGER, EnrichR, and WebGestaltR to show its advantages in integrative GNPA. The interactive online web APP is publicly accessible from the link, https://aimed-lab.shinyapps.io/PAGERwebapp/.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute in the School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Radomir Slominski
- Informatics Institute in the School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
- Graduate Biomedical Sciences Program, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Samuel Bharti
- Informatics Institute in the School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jake Y. Chen
- Informatics Institute in the School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| |
Collapse
|
11
|
Zindl CL, Witte SJ, Laufer VA, Gao M, Yue Z, Janowski KM, Cai B, Frey BF, Silberger DJ, Harbour SN, Singer JR, Turner H, Lund FE, Vallance BA, Rosenberg AF, Schoeb TR, Chen JY, Hatton RD, Weaver CT. A nonredundant role for T cell-derived interleukin 22 in antibacterial defense of colonic crypts. Immunity 2022; 55:494-511.e11. [PMID: 35263568 PMCID: PMC9126440 DOI: 10.1016/j.immuni.2022.02.003] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 11/11/2021] [Accepted: 02/04/2022] [Indexed: 02/05/2023]
Abstract
Interleukin (IL)-22 is central to immune defense at barrier sites. We examined the contributions of innate lymphoid cell (ILC) and T cell-derived IL-22 during Citrobacter rodentium (C.r) infection using mice that both report Il22 expression and allow lineage-specific deletion. ILC-derived IL-22 activated STAT3 in C.r-colonized surface intestinal epithelial cells (IECs) but only temporally restrained bacterial growth. T cell-derived IL-22 induced a more robust and extensive activation of STAT3 in IECs, including IECs lining colonic crypts, and T cell-specific deficiency of IL-22 led to pathogen invasion of the crypts and increased mortality. This reflected a requirement for T cell-derived IL-22 for the expression of a host-protective transcriptomic program that included AMPs, neutrophil-recruiting chemokines, and mucin-related molecules, and it restricted IFNγ-induced proinflammatory genes. Our findings demonstrate spatiotemporal differences in the production and action of IL-22 by ILCs and T cells during infection and reveal an indispensable role for IL-22-producing T cells in the protection of the intestinal crypts.
Collapse
Affiliation(s)
- Carlene L Zindl
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
| | - Steven J Witte
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL 35294, USA; Department of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Vincent A Laufer
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL 35294, USA; Department of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Min Gao
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL 35294, USA; Informatics Institute, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Zongliang Yue
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL 35294, USA; Informatics Institute, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Karen M Janowski
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Baiyi Cai
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Blake F Frey
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Daniel J Silberger
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Stacey N Harbour
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jeffrey R Singer
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Henrietta Turner
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Frances E Lund
- Department of Microbiology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Bruce A Vallance
- Department of Pediatrics, University of British Columbia, Vancouver, BC V6H 3V4, Canada
| | - Alexander F Rosenberg
- Informatics Institute, University of Alabama at Birmingham, Birmingham, AL 35294, USA; Department of Microbiology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Trenton R Schoeb
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jake Y Chen
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL 35294, USA; Informatics Institute, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Robin D Hatton
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Casey T Weaver
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
| |
Collapse
|
12
|
Kosvyra A, Ntzioni E, Chouvarda I. Network analysis with biological data of cancer patients: A scoping review. J Biomed Inform 2021; 120:103873. [PMID: 34298154 DOI: 10.1016/j.jbi.2021.103873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 06/30/2021] [Accepted: 07/18/2021] [Indexed: 12/25/2022]
Abstract
BACKGROUND & OBJECTIVE Network Analysis (NA) is a mathematical method that allows exploring relations between units and representing them as a graph. Although NA was initially related to social sciences, the past two decades was introduced in Bioinformatics. The recent growth of the networks' use in biological data analysis reveals the need to further investigate this area. In this work, we attempt to identify the use of NA with biological data, and specifically: (a) what types of data are used and whether they are integrated or not, (b) what is the purpose of this analysis, predictive or descriptive, and (c) the outcome of such analyses, specifically in cancer diseases. METHODS & MATERIALS The literature review was conducted on two databases, PubMed & IEEE, and was restricted to journal articles of the last decade (January 2010 - December 2019). At a first level, all articles were screened by title and abstract, and at a second level the screening was conducted by reading the full text article, following the predefined inclusion & exclusion criteria leading to 131 articles of interest. A table was created with the information of interest and was used for the classification of the articles. The articles were initially classified to analysis studies and studies that propose a new algorithm or methodology. Each one of these categories was further screened by the following clustering criteria: (a) data used, (b) study purpose, (c) study outcome. Specifically for the studies proposing a new algorithm, the novelty presented in each one was detected. RESULTS & Conclusions: In the past five years researchers are focusing on creating new algorithms and methodologies to enhance this field. The articles' classification revealed that only 25% of the analyses are integrating multi-omics data, although 50% of the new algorithms developed follow this integrative direction. Moreover, only 20% of the analyses and 10% of the newly developed methodologies have a predictive purpose. Regarding the result of the works reviewed, 75% of the studies focus on identifying, prognostic or not, gene signatures. Concluding, this review revealed the need for deploying predictive and multi-omics integrative algorithms and methodologies that can be used to enhance cancer diagnosis, prognosis and treatment.
Collapse
Affiliation(s)
- A Kosvyra
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | - E Ntzioni
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - I Chouvarda
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
13
|
Abstract
Overcoming the challenges of understanding and treating cancer requires reliable patient-derived models of cancer (PDMCs). For decades, cancer research and therapeutic development relied primarily on cancer cell lines because of their prevalence, reproducibility, and simplicity to maintain. However, findings from research conducted in cell lines are rarely recapitulated in vivo and seldom directly translatable to patients. The tumor microenvironment (TME), tumor-stromal interactions, and associations with host immune cells produce profound changes in tumor phenotype and complexity not captured in traditional monolayer cell culture. In this chapter, we present various cancer explant models and discuss their applicability based on specific research aims. We discuss the appropriateness of these models for basic science questions, drug screening/development, and for personalized, precision medicine. We also consider logistical factors such as resource cost, technical difficulty, and accessibility. We finish this chapter with a practical guide intended to help the reader select the cancer explant model system(s) that best address their research aims.
Collapse
|
14
|
Nguyen T, Zhang T, Fox G, Zeng S, Cao N, Pan C, Chen JY. Linking clinotypes to phenotypes and genotypes from laboratory test results in comprehensive physical exams. BMC Med Inform Decis Mak 2021; 21:51. [PMID: 33627109 PMCID: PMC7903607 DOI: 10.1186/s12911-021-01387-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 01/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In this work, we aimed to demonstrate how to utilize the lab test results and other clinical information to support precision medicine research and clinical decisions on complex diseases, with the support of electronic medical record facilities. We defined "clinotypes" as clinical information that could be observed and measured objectively using biomedical instruments. From well-known 'omic' problem definitions, we defined problems using clinotype information, including stratifying patients-identifying interested sub cohorts for future studies, mining significant associations between clinotypes and specific phenotypes-diseases, and discovering potential linkages between clinotype and genomic information. We solved these problems by integrating public omic databases and applying advanced machine learning and visual analytic techniques on two-year health exam records from a large population of healthy southern Chinese individuals (size n = 91,354). When developing the solution, we carefully addressed the missing information, imbalance and non-uniformed data annotation issues. RESULTS We organized the techniques and solutions to address the problems and issues above into CPA framework (Clinotype Prediction and Association-finding). At the data preprocessing step, we handled the missing value issue with predicted accuracy of 0.760. We curated 12,635 clinotype-gene associations. We found 147 Associations between 147 chronic diseases-phenotype and clinotypes, which improved the disease predictive performance to AUC (average) of 0.967. We mined 182 significant clinotype-clinotype associations among 69 clinotypes. CONCLUSIONS Our results showed strong potential connectivity between the omics information and the clinical lab test information. The results further emphasized the needs to utilize and integrate the clinical information, especially the lab test results, in future PheWas and omic studies. Furthermore, it showed that the clinotype information could initiate an alternative research direction and serve as an independent field of data to support the well-known 'phenome' and 'genome' researches.
Collapse
Affiliation(s)
- Thanh Nguyen
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, AL, Birmingham, USA
| | - Tongbin Zhang
- School of First Clinical Medical Sciences - School of Information and Engineering, Wenzhou Medical University, Zhejiang, China
- Department of Computer Technology and Information Management, The First Affiliated Hospital of Wenzhou Medical University, Zhejiang, China
| | - Geoffrey Fox
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| | - Sisi Zeng
- School of First Clinical Medical Sciences - School of Information and Engineering, Wenzhou Medical University, Zhejiang, China
| | - Ni Cao
- School of First Clinical Medical Sciences - School of Information and Engineering, Wenzhou Medical University, Zhejiang, China
| | - Chuandi Pan
- School of First Clinical Medical Sciences - School of Information and Engineering, Wenzhou Medical University, Zhejiang, China
- Department of Computer Technology and Information Management, The First Affiliated Hospital of Wenzhou Medical University, Zhejiang, China
| | - Jake Y Chen
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, AL, Birmingham, USA.
| |
Collapse
|
15
|
Yue Z, Zhang E, Xu C, Khurana S, Batra N, Dang SDH, Cimino JJ, Chen JY. PAGER-CoV: a comprehensive collection of pathways, annotated gene-lists and gene signatures for coronavirus disease studies. Nucleic Acids Res 2021; 49:D589-D599. [PMID: 33245774 PMCID: PMC7778959 DOI: 10.1093/nar/gkaa1094] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 10/23/2020] [Accepted: 10/27/2020] [Indexed: 12/13/2022] Open
Abstract
PAGER-CoV (http://discovery.informatics.uab.edu/PAGER-CoV/) is a new web-based database that can help biomedical researchers interpret coronavirus-related functional genomic study results in the context of curated knowledge of host viral infection, inflammatory response, organ damage, and tissue repair. The new database consists of 11 835 PAGs (Pathways, Annotated gene-lists, or Gene signatures) from 33 public data sources. Through the web user interface, users can search by a query gene or a query term and retrieve significantly matched PAGs with all the curated information. Users can navigate from a PAG of interest to other related PAGs through either shared PAG-to-PAG co-membership relationships or PAG-to-PAG regulatory relationships, totaling 19 996 993. Users can also retrieve enriched PAGs from an input list of COVID-19 functional study result genes, customize the search data sources, and export all results for subsequent offline data analysis. In a case study, we performed a gene set enrichment analysis (GSEA) of a COVID-19 RNA-seq data set from the Gene Expression Omnibus database. Compared with the results using the standard PAGER database, PAGER-CoV allows for more sensitive matching of known immune-related gene signatures. We expect PAGER-CoV to be invaluable for biomedical researchers to find molecular biology mechanisms and tailored therapeutics to treat COVID-19 patients.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL 35223, USA
| | - Eric Zhang
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL 35223, USA
| | - Clark Xu
- University of Wisconsin-Madison School of Medicine and Public Health, Institute of Clinical and Translational Research, Madison, WI 53705-2221, USA
| | - Sunny Khurana
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL 35223, USA
| | - Nishant Batra
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL 35223, USA
| | - Son Do Hai Dang
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL 35223, USA
| | - James J Cimino
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL 35223, USA
| | - Jake Y Chen
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL 35223, USA
| |
Collapse
|
16
|
Yue Z, Yan D, Guo G, Chen JY. Biological Network Mining. Methods Mol Biol 2021; 2328:139-151. [PMID: 34251623 DOI: 10.1007/978-1-0716-1534-8_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this book chapter, we introduce a pipeline to mine significant biomedical entities (or bioentities) in biological networks. Our focus is on prioritizing both bioentities themselves and the associations between bioentities in order to reveal their biological functions. We will introduce three tools BEERE, WIPER, and PAGER 2.0 that can be used together for network analysis and function interpretation: (1) BEERE is a network analysis tool for "Biomedical Entity Expansion, Ranking and Explorations," (2) WIPER is an entity-to-entity association ranking tool, and (3) PAGER 2.0 is a service for gene enrichment analysis.
Collapse
Affiliation(s)
- Zongliang Yue
- The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Da Yan
- The University of Alabama at Birmingham, Birmingham, AL, USA.
| | - Guimu Guo
- The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jake Y Chen
- The University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
17
|
Association study based on topological constraints of protein-protein interaction networks. Sci Rep 2020; 10:10797. [PMID: 32612246 PMCID: PMC7329836 DOI: 10.1038/s41598-020-67875-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 06/15/2020] [Indexed: 12/17/2022] Open
Abstract
The non-random interaction pattern of a protein–protein interaction network (PIN) is biologically informative, but its potentials have not been fully utilized in omics studies. Here, we propose a network-permutation-based association study (NetPAS) method that gauges the observed interactions between two sets of genes based on the comparison between permutation null models and the empirical networks. This enables NetPAS to evaluate relationships, constrained by network topology, between gene sets related to different phenotypes. We demonstrated the utility of NetPAS in 50 well-curated gene sets and comparison of association studies using Z-scores, modified Zʹ-scores, p-values and Jaccard indices. Using NetPAS, a weighted human disease network was generated from the association scores of 19 gene sets from OMIM. We also applied NetPAS in gene sets derived from gene ontology and pathway annotations and showed that NetPAS uncovered functional terms missed by DAVID and WebGestalt. Overall, we show that NetPAS can take topological constraints of molecular networks into account and offer new perspectives than existing methods.
Collapse
|
18
|
Yue Z, Nguyen T, Zhang E, Zhang J, Chen JY. WIPER: Weighted in-Path Edge Ranking for biomolecular association networks. QUANTITATIVE BIOLOGY 2019; 7:313-326. [PMID: 38525413 PMCID: PMC10959292 DOI: 10.1007/s40484-019-0180-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Revised: 08/02/2019] [Accepted: 08/08/2019] [Indexed: 10/25/2022]
Abstract
Background In network biology researchers generate biomolecular networks with candidate genes or proteins experimentally-derived from high-throughput data and known biomolecular associations. Current bioinformatics research focuses on characterizing candidate genes/proteins, or nodes, with network characteristics, e.g., betweenness centrality. However, there have been few research reports to characterize and prioritize biomolecular associations ("edges"), which can represent gene regulatory events essential to biological processes. Method We developed Weighted In-Path Edge Ranking (WIPER), a new computational algorithm which can help evaluate all biomolecular interactions/associations ("edges") in a network model and generate a rank order of every edge based on their in-path traversal scores and statistical significance test result. To validate whether WIPER worked as we designed, we tested the algorithm on synthetic network models. Results Our results showed WIPER can reliably discover both critical "well traversed in-path edges", which are statistically more traversed than normal edges, and "peripheral in-path edges", which are less traversed than normal edges. Compared with other simple measures such as betweenness centrality, WIPER provides better biological interpretations. In the case study of analyzing postanal pig hearts gene expression, WIPER highlighted new signaling pathways suggestive of cardiomyocyte regeneration and proliferation. In the case study of Alzheimer's disease genetic disorder association, WIPER reports SRC:APP, AR:APP, APP:FYN, and APP:NES edges (gene-gene associations) both statistically and biologically important from PubMed co-citation. Conclusion We believe that WIPER will become an essential software tool to help biologists discover and validate essential signaling/regulatory events from high-throughput biology data in the context of biological networks. Availability The free WIPER API is described at discovery.informatics.uab.edu/wiper/.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, AL 35233, USA
| | - Thanh Nguyen
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, AL 35233, USA
| | - Eric Zhang
- Department of Biomedical Engineering, University of Alabama, Birmingham, AL 35233, USA
| | - Jianyi Zhang
- Department of Biomedical Engineering, University of Alabama, Birmingham, AL 35233, USA
| | - Jake Y. Chen
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, AL 35233, USA
- Department of Biomedical Engineering, University of Alabama, Birmingham, AL 35233, USA
- Department of Computer Science, University of Alabama, Birmingham, AL 35233, USA
| |
Collapse
|
19
|
Yue Z, Zheng Q, Neylon MT, Yoo M, Shin J, Zhao Z, Tan AC, Chen JY. PAGER 2.0: an update to the pathway, annotated-list and gene-signature electronic repository for Human Network Biology. Nucleic Acids Res 2019; 46:D668-D676. [PMID: 29126216 PMCID: PMC5753198 DOI: 10.1093/nar/gkx1040] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Accepted: 11/03/2017] [Indexed: 12/14/2022] Open
Abstract
Integrative Gene-set, Network and Pathway Analysis (GNPA) is a powerful data analysis approach developed to help interpret high-throughput omics data. In PAGER 1.0, we demonstrated that researchers can gain unbiased and reproducible biological insights with the introduction of PAGs (Pathways, Annotated-lists and Gene-signatures) as the basic data representation elements. In PAGER 2.0, we improve the utility of integrative GNPA by significantly expanding the coverage of PAGs and PAG-to-PAG relationships in the database, defining a new metric to quantify PAG data qualities, and developing new software features to simplify online integrative GNPA. Specifically, we included 84 282 PAGs spanning 24 different data sources that cover human diseases, published gene-expression signatures, drug-gene, miRNA-gene interactions, pathways and tissue-specific gene expressions. We introduced a new normalized Cohesion Coefficient (nCoCo) score to assess the biological relevance of genes inside a PAG, and RP-score to rank genes and assign gene-specific weights inside a PAG. The companion web interface contains numerous features to help users query and navigate the database content. The database content can be freely downloaded and is compatible with third-party Gene Set Enrichment Analysis tools. We expect PAGER 2.0 to become a major resource in integrative GNPA. PAGER 2.0 is available at http://discovery.informatics.uab.edu/PAGER/.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35294, USA
| | - Qi Zheng
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35294, USA.,School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, Guangdong 510006, China
| | - Michael T Neylon
- Indiana University School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Minjae Yoo
- Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Jimin Shin
- Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Zhiying Zhao
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35294, USA.,School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
| | - Aik Choon Tan
- Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Jake Y Chen
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35294, USA
| |
Collapse
|
20
|
Yue Z, Neylon MT, Nguyen T, Ratliff T, Chen JY. "Super Gene Set" Causal Relationship Discovery from Functional Genomics Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1991-1998. [PMID: 30040650 PMCID: PMC6380687 DOI: 10.1109/tcbb.2018.2858755] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this article, we present a computational framework to identify "causal relationships" among super gene sets. For "causal relationships," we refer to both stimulatory and inhibitory regulatory relationships, regardless of through direct or indirect mechanisms. For super gene sets, we refer to "pathways, annotated lists, and gene signatures," or PAGs. To identify causal relationships among PAGs, we extend the previous work on identifying PAG-to-PAG regulatory relationships by further requiring them to be significantly enriched with gene-to-gene co-expression pairs across the two PAGs involved. This is achieved by developing a quantitative metric based on PAG-to-PAG Co-expressions (PPC), which we use to infer the likelihood that PAG-to-PAG relationships under examination are causal-either stimulatory or inhibitory. Since true causal relationships are unknown, we approximate the overall performance of inferring causal relationships with the performance of recalling known r-type PAG-to-PAG relationships from causal PAG-to-PAG inference, using a functional genomics benchmark dataset from the GEO database. We report the area-under-curve (AUC) performance for both precision and recall being 0.81. By applying our framework to a myeloid-derived suppressor cells (MDSC) dataset, we further demonstrate that this framework is effective in helping build multi-scale biomolecular systems models with new insights on regulatory and causal links for downstream biological interpretations.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35233, US.
| | - Michael T. Neylon
- School of Informatics and Computing, Indiana University, Indianapolis, IN 46202, US.
| | - Thanh Nguyen
- Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35233, US.
| | - Timothy Ratliff
- Purdue University Center for Cancer Research, West Lafayette, IN 47906, US.
| | - Jake Y. Chen
- Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35233, US.
| |
Collapse
|
21
|
Abstract
Background Much effort has been devoted to the discovery of specific mechanisms between drugs and single targets to date. However, as biological systems maintain homeostasis at the level of functional networks robustly controlling the internal environment, such networks commonly contain multiple redundant mechanisms designed to counteract loss or perturbation of a single member of the network. As such, investigation of therapeutics that target dysregulated pathways or processes, rather than single targets, may identify agents that function at a level of the biological organization more relevant to the pathology of complex diseases such as Parkinson’s Disease (PD). Genome-wide association studies (GWAS) in PD have identified common variants underlying disease susceptibility, while gene expression microarray data provide genome-wide transcriptional profiles. These genomic studies can illustrate upstream perturbations causing the dysfunction in signaling pathways and downstream biochemical mechanisms leading to the PD phenotype. We hypothesize that drugs acting at the level of a gene expression module specific to PD can overcome the lack of efficacy associated with targeting a single gene in polygenic diseases. Thus, this approach represents a promising new direction for module-based drug discovery in human diseases such as PD. Results We built a framework that integrates GWAS data with gene co-expression modules from tissues representing three brain regions—the frontal gyrus, the lateral substantia, and the medial substantia in PD patients. Using weighted gene correlation network analysis (WGCNA) software package in R, we conducted enrichment analysis of data from a GWAS of PD. This led to the identification of two over-represented PD-specific gene co-expression network modules: the Brown Module (Br) containing 449 genes and the Turquoise module (T) containing 905 genes. Further enrichment analysis identified four functional pathways within the Br module (cellular respiration, intracellular transport, energy coupled proton transport against the electrochemical gradient, and microtubule-based movement), and one functional pathway within the T module (M-phase). Next, we utilized drug-protein regulatory relationship databases (DMAP) and developed a Drug Effect Sum Score (DESS) to evaluate all candidate drugs that might restore gene expression to normal level across the Br and T modules. Among the drugs with the 12 highest DESS scores, 5 had been reported as potential treatments for PD and 6 hold potential repositioning applications. Conclusion In this study, we present a systems pharmacology framework which draws on genetic data from GWAS and gene expression microarray data to reposition drugs for PD. Our innovative approach integrates gene co-expression modules with biomolecular interaction network analysis to identify network modules critical to the PD pathway and disease mechanism. We quantify the positive effects of drugs in a DESS score that is based on known drug-target activity profiles. Our results illustrate that this modular approach is promising for repositioning drugs for use in polygenic diseases such as PD, and is capable of addressing challenges of the hindered gene target in drug repositioning approaches to date. Electronic supplementary material The online version of this article (10.1186/s12859-017-1889-0) contains supplementary material, which is available to authorized users.
Collapse
|
22
|
Suphavilai C, Zhu L, Chen JY. A method for developing regulatory gene set networks to characterize complex biological systems. BMC Genomics 2015; 16 Suppl 11:S4. [PMID: 26576648 PMCID: PMC4652563 DOI: 10.1186/1471-2164-16-s11-s4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Background Traditional approaches to studying molecular networks are based on linking genes or proteins. Higher-level networks linking gene sets or pathways have been proposed recently. Several types of gene set networks have been used to study complex molecular networks such as co-membership gene set networks (M-GSNs) and co-enrichment gene set networks (E-GSNs). Gene set networks are useful for studying biological mechanism of diseases and drug perturbations. Results In this study, we proposed a new approach for constructing directed, regulatory gene set networks (R-GSNs) to reveal novel relationships among gene sets or pathways. We collected several gene set collections and high-quality gene regulation data in order to construct R-GSNs in a comparative study with co-membership gene set networks (M-GSNs). We described a method for constructing both global and disease-specific R-GSNs and determining their significance. To demonstrate the potential applications to disease biology studies, we constructed and analysed an R-GSN specifically built for Alzheimer's disease. Conclusions R-GSNs can provide new biological insights complementary to those derived at the protein regulatory network level or M-GSNs. When integrated properly to functional genomics data, R-GSNs can help enable future research on systems biology and translational bioinformatics.
Collapse
|