1
|
Zhu Y, Shutta KH, Huang T, Balasubramanian R, Zeleznik OA, Clish CB, Ávila-Pacheco J, Hankinson SE, Kubzansky LD. Persistent PTSD symptoms are associated with plasma metabolic alterations relevant to long-term health: A metabolome-wide investigation in women. Psychol Med 2025; 55:e30. [PMID: 39924258 PMCID: PMC12017366 DOI: 10.1017/s0033291724003374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 11/22/2024] [Accepted: 11/27/2024] [Indexed: 02/11/2025]
Abstract
BACKGROUND Post-traumatic stress disorder (PTSD) is characterized by severe distress and associated with cardiometabolic diseases. Studies in military and clinical populations suggest that dysregulated metabolomic processes may be a key mechanism. Prior work identified and validated a metabolite-based distress score (MDS) linked with depression and anxiety and subsequent cardiometabolic diseases. Here, we assessed whether PTSD shares metabolic alterations with depression and anxiety and if additional metabolites are related to PTSD. METHODS We leveraged plasma metabolomics data from three subsamples nested within the Nurses' Health Study II, including 2835 women with 2950 blood samples collected across three time points (1996-2014) and 339 known metabolites assayed by mass spectrometry-based techniques. Trauma and PTSD exposures were assessed in 2008 and characterized as follows: lifetime trauma without PTSD, lifetime PTSD in remission, and persistent PTSD symptoms. Associations between the exposures and the MDS or individual metabolites were estimated within each subsample adjusting for potential confounders and combined in random-effects meta-analyses. RESULTS Persistent PTSD symptoms were associated with higher levels of the previously developed MDS. Out of 339 metabolites, we identified 29 metabolites (primarily elevated glycerophospholipids and glycerolipids) associated with persistent symptoms (false discovery rate < 0.05; adjusting for technical covariates). No metabolite associations were found with the other PTSD-related exposures. CONCLUSIONS As the first large-scale, population-based metabolomics analysis of PTSD, our study highlighted shared and distinct metabolic differences linked to PTSD versus depression or anxiety. We identified novel metabolite markers associated with PTSD symptom persistence, suggesting further connections with metabolic dysregulation that may have downstream consequences for health.
Collapse
Affiliation(s)
- Yiwen Zhu
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Katherine H. Shutta
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Tianyi Huang
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Raji Balasubramanian
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts Amherst, Amherst, MA, USA
| | - Oana A. Zeleznik
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Clary B. Clish
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Julián Ávila-Pacheco
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Susan E. Hankinson
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts Amherst, Amherst, MA, USA
| | - Laura D. Kubzansky
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
2
|
Kastendiek N, Coletti R, Gross T, Lopes MB. Exploring glioma heterogeneity through omics networks: from gene network discovery to causal insights and patient stratification. BioData Min 2024; 17:56. [PMID: 39696678 DOI: 10.1186/s13040-024-00411-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 11/25/2024] [Indexed: 12/20/2024] Open
Abstract
Gliomas are primary malignant brain tumors with a typically poor prognosis, exhibiting significant heterogeneity across different cancer types. Each glioma type possesses distinct molecular characteristics determining patient prognosis and therapeutic options. This study aims to explore the molecular complexity of gliomas at the transcriptome level, employing a comprehensive approach grounded in network discovery. The graphical lasso method was used to estimate a gene co-expression network for each glioma type from a transcriptomics dataset. Causality was subsequently inferred from correlation networks by estimating the Jacobian matrix. The networks were then analyzed for gene importance using centrality measures and modularity detection, leading to the selection of genes that might play an important role in the disease. To explore the pathways and biological functions these genes are involved in, KEGG and Gene Ontology (GO) enrichment analyses on the disclosed gene sets were performed, highlighting the significance of the genes selected across several relevent pathways and GO terms. Spectral clustering based on patient similarity networks was applied to stratify patients into groups with similar molecular characteristics and to assess whether the resulting clusters align with the diagnosed glioma type. The results presented highlight the ability of the proposed methodology to uncover relevant genes associated with glioma intertumoral heterogeneity. Further investigation might encompass biological validation of the putative biomarkers disclosed.
Collapse
Affiliation(s)
- Nina Kastendiek
- Institute for Chemistry and Biology of the Marine Environment, University of Oldenburg, Oldenburg, 26129, Germany
| | - Roberta Coletti
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology (NOVA FCT), Caparica, 2829-516, Portugal
| | - Thilo Gross
- Institute for Chemistry and Biology of the Marine Environment, University of Oldenburg, Oldenburg, 26129, Germany
- Helmholtz Institute for Functional Marine Biodiversity (HIFMB), Oldenburg, 26129, Germany
- Alfred Wegener Institute, Helmholtz Center for Polar and Marine Research, Bremerhaven, 27570, Germany
| | - Marta B Lopes
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology (NOVA FCT), Caparica, 2829-516, Portugal.
- UNIDEMI, Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology (NOVA FCT), Caparica, 2829-516, Portugal.
| |
Collapse
|
3
|
Chen J, Murabito JM, Lunetta KL. ONDSA: a testing framework based on Gaussian graphical models for differential and similarity analysis of multiple omics networks. Brief Bioinform 2024; 26:bbae610. [PMID: 39581869 PMCID: PMC11586129 DOI: 10.1093/bib/bbae610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 10/21/2024] [Accepted: 11/08/2024] [Indexed: 11/26/2024] Open
Abstract
The Gaussian graphical model (GGM) is a statistical network approach that represents conditional dependencies among components, enabling a comprehensive exploration of disease mechanisms using high-throughput multi-omics data. Analyzing differential and similar structures in biological networks across multiple clinical conditions can reveal significant biological pathways and interactions associated with disease onset and progression. However, most existing methods for estimating group differences in sparse GGMs only apply to comparisons between two groups, and the challenging problem of multiple testing across multiple GGMs persists. This limitation hinders the ability to uncover complex biological insights that arise from comparing multiple conditions simultaneously. To address these challenges, we propose the Omics Networks Differential and Similarity Analysis (ONDSA) framework, specifically designed for continuous omics data. ONDSA tests for structural differences and similarities across multiple groups, effectively controlling the false discovery rate (FDR) at a desired level. Our approach focuses on entry-wise comparisons of precision matrices across groups, introducing two test statistics to sequentially estimate structural differences and similarities while adjusting for correlated effects in FDR control procedures. We show via comprehensive simulations that ONDSA outperforms existing methods under a range of graph structures and is a valuable tool for joint comparisons of multiple GGMs. We also illustrate our method through the detection of neuroinflammatory pathways in a multi-omics dataset from the Framingham Heart Study Offspring cohort, involving three apolipoprotein E genotype groups. It highlights ONDSA's ability to provide a more holistic view of biological interactions and disease mechanisms through multi-omics data integration.
Collapse
Affiliation(s)
- Jiachen Chen
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Avenue, Crosstown, 3rd floor, Boston, MA 02218, United States
| | - Joanne M Murabito
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University Chobanian & Avedisian School of Medicine and Boston Medical Center, 73 Mount Wayte Avenue, Framingham, MA 01702, United States
- Department of Medicine, Section of General Internal Medicine, Boston University Chobanian & Avedisian School of Medicine and Boston Medical Center, 72 E Concord St, Suite L-516, Boston, MA 02118, United States
| | - Kathryn L Lunetta
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Avenue, Crosstown, 3rd floor, Boston, MA 02218, United States
| |
Collapse
|
4
|
Guzzi PH, Roy A, Milano M, Veltri P. Non parametric differential network analysis: a tool for unveiling specific molecular signatures. BMC Bioinformatics 2024; 25:359. [PMID: 39558195 PMCID: PMC11575037 DOI: 10.1186/s12859-024-05969-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 10/24/2024] [Indexed: 11/20/2024] Open
Abstract
BACKGROUND The rewiring of molecular interactions in various conditions leads to distinct phenotypic outcomes. Differential network analysis (DINA) is dedicated to exploring these rewirings within gene and protein networks. Leveraging statistical learning and graph theory, DINA algorithms scrutinize alterations in interaction patterns derived from experimental data. RESULTS Introducing a novel approach to differential network analysis, we incorporate differential gene expression based on sex and gender attributes. We hypothesize that gene expression can be accurately represented through non-Gaussian processes. Our methodology involves quantifying changes in non-parametric correlations among gene pairs and expression levels of individual genes. CONCLUSIONS Applying our method to public expression datasets concerning diabetes mellitus and atherosclerosis in liver tissue, we identify gender-specific differential networks. Results underscore the biological relevance of our approach in uncovering meaningful molecular distinctions.
Collapse
Affiliation(s)
- Pietro Hiram Guzzi
- Department of Medical and Surgical Sciences, Magna Graecia University, Catanzaro, Italy
| | | | - Marianna Milano
- Department of Experimental and Clinical Medicine, Magna Graecia University, Catanzaro, Italy.
| | - Pierangelo Veltri
- Department of Computer Science, Modelling and Electronics DIMES, University of Calabria, Rende, Italy
| |
Collapse
|
5
|
Zhu Y, Shutta KH, Huang T, Balasubramanian R, Zeleznik OA, Clish CB, Ávila-Pacheco J, Hankinson SE, Kubzansky LD. Persistent PTSD symptoms are associated with plasma metabolic alterations relevant to long-term health: A metabolome-wide investigation in women. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.07.24311628. [PMID: 39148851 PMCID: PMC11326341 DOI: 10.1101/2024.08.07.24311628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Background Posttraumatic stress disorder (PTSD) is characterized by severe distress and associated with cardiometabolic diseases. Studies in military and clinical populations suggest dysregulated metabolomic processes may be a key mechanism. Prior work identified and validated a metabolite-based distress score (MDS) linked with depression and anxiety and subsequent cardiometabolic diseases. Here, we assessed whether PTSD shares metabolic alterations with depression and anxiety and also if additional metabolites are related to PTSD. Methods We leveraged plasma metabolomics data from three subsamples nested within the Nurses' Health Study II, including 2835 women with 2950 blood samples collected across three timepoints (1996-2014) and 339 known metabolites consistently assayed by mass spectrometrybased techniques. Trauma and PTSD exposures were assessed in 2008 and characterized as follows: lifetime trauma without PTSD, lifetime PTSD in remission, and persistent PTSD symptoms. Associations between the exposures and the MDS or individual metabolites were estimated within each subsample adjusting for potential confounders and combined in random-effects meta-analyses. Results Persistent PTSD symptoms were associated with higher levels of the previously developed MDS for depression and anxiety. Out of 339 metabolites, we identified nine metabolites (primarily elevated glycerophospholipids) associated with persistent symptoms (false discovery rate<0.05). No metabolite associations were found with the other PTSD-related exposures. Conclusions As the first large-scale, population-based metabolomics analysis of PTSD, our study highlighted shared and distinct metabolic differences linked to PTSD versus depression or anxiety. We identified novel metabolite markers associated with PTSD symptom persistence, suggesting further connections with metabolic dysregulation that may have downstream consequences for health.
Collapse
Affiliation(s)
- Yiwen Zhu
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Katherine H. Shutta
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Tianyi Huang
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Raji Balasubramanian
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts Amherst, Amherst, MA, USA
| | - Oana A. Zeleznik
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Clary B. Clish
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Julián Ávila-Pacheco
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Susan E. Hankinson
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts Amherst, Amherst, MA, USA
| | - Laura D. Kubzansky
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
6
|
Huang X, Zhang H. Detecting responsible nodes in differential Bayesian networks. Stat Med 2024; 43:3294-3312. [PMID: 38831542 DOI: 10.1002/sim.10125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 03/25/2024] [Accepted: 05/18/2024] [Indexed: 06/05/2024]
Abstract
To study the roles that different nodes play in differentiating Bayesian networks under two states, such as control versus disease, we formulate two node-specific scores to facilitate such assessment. The first score is motivated by the prediction invariance property of a causal model. The second score results from modifying an existing score constructed for differential analysis of undirected networks. We develop strategies based on these scores to identify nodes responsible for topological differences between two Bayesian networks. Synthetic data and real-life data from designed experiments are used to demonstrate the efficacy of the proposed methods in detecting responsible nodes.
Collapse
Affiliation(s)
- Xianzheng Huang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| | - Hongmei Zhang
- Division of Epidemiology, Biostatistics, and Environmental Health, School of Public Health, University of Memphis, Memphis, Tennessee
| |
Collapse
|
7
|
Fu Y, Lu Y, Wang Y, Zhang B, Zhang Z, Yu G, Liu C, Clarke R, Herrington DM, Wang Y. DDN3.0: determining significant rewiring of biological network structure with differential dependency networks. Bioinformatics 2024; 40:btae376. [PMID: 38902940 PMCID: PMC11199198 DOI: 10.1093/bioinformatics/btae376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 05/09/2024] [Accepted: 06/19/2024] [Indexed: 06/22/2024] Open
Abstract
MOTIVATION Complex diseases are often caused and characterized by misregulation of multiple biological pathways. Differential network analysis aims to detect significant rewiring of biological network structures under different conditions and has become an important tool for understanding the molecular etiology of disease progression and therapeutic response. With few exceptions, most existing differential network analysis tools perform differential tests on separately learned network structures that are computationally expensive and prone to collapse when grouped samples are limited or less consistent. RESULTS We previously developed an accurate differential network analysis method-differential dependency networks (DDN), that enables joint learning of common and rewired network structures under different conditions. We now introduce the DDN3.0 tool that improves this framework with three new and highly efficient algorithms, namely, unbiased model estimation with a weighted error measure applicable to imbalance sample groups, multiple acceleration strategies to improve learning efficiency, and data-driven determination of proper hyperparameters. The comparative experimental results obtained from both realistic simulations and case studies show that DDN3.0 can help biologists more accurately identify, in a study-specific and often unknown conserved regulatory circuitry, a network of significantly rewired molecular players potentially responsible for phenotypic transitions. AVAILABILITY AND IMPLEMENTATION The Python package of DDN3.0 is freely available at https://github.com/cbil-vt/DDN3. A user's guide and a vignette are provided at https://ddn-30.readthedocs.io/.
Collapse
Affiliation(s)
- Yi Fu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
| | - Yingzhou Lu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
| | - Yizhi Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
| | - Bai Zhang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
| | - Zhen Zhang
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
| | - Guoqiang Yu
- Department of Automation, Tsinghua University, Beijing 100084, P.R. China
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, United States
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, United States
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, United States
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
| |
Collapse
|
8
|
Ravichandran P, Parsana P, Keener R, Hansen KD, Battle A. Aggregation of recount3 RNA-seq data improves inference of consensus and tissue-specific gene co-expression networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576447. [PMID: 38328080 PMCID: PMC10849507 DOI: 10.1101/2024.01.20.576447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Background Gene co-expression networks (GCNs) describe relationships among expressed genes key to maintaining cellular identity and homeostasis. However, the small sample size of typical RNA-seq experiments which is several orders of magnitude fewer than the number of genes is too low to infer GCNs reliably. recount3, a publicly available dataset comprised of 316,443 uniformly processed human RNA-seq samples, provides an opportunity to improve power for accurate network reconstruction and obtain biological insight from the resulting networks. Results We compared alternate aggregation strategies to identify an optimal workflow for GCN inference by data aggregation and inferred three consensus networks: a universal network, a non-cancer network, and a cancer network in addition to 27 tissue context-specific networks. Central network genes from our consensus networks were enriched for evolutionarily constrained genes and ubiquitous biological pathways, whereas central context-specific network genes included tissue-specific transcription factors and factorization based on the hubs led to clustering of related tissue contexts. We discovered that annotations corresponding to context-specific networks inferred from aggregated data were enriched for trait heritability beyond known functional genomic annotations and were significantly more enriched when we aggregated over a larger number of samples. Conclusion This study outlines best practices for network GCN inference and evaluation by data aggregation. We recommend estimating and regressing confounders in each data set before aggregation and prioritizing large sample size studies for GCN reconstruction. Increased statistical power in inferring context-specific networks enabled the derivation of variant annotations that were enriched for concordant trait heritability independent of functional genomic annotations that are context-agnostic. While we observed strictly increasing held-out log-likelihood with data aggregation, we noted diminishing marginal improvements. Future directions aimed at alternate methods for estimating confounders and integrating orthogonal information from modalities such as Hi-C and ChIP-seq can further improve GCN inference.
Collapse
Affiliation(s)
| | - Princy Parsana
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Rebecca Keener
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Kaspar D Hansen
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Biostatistics, Johns Hopkins School of Public Health, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, USA
- Data Science and AI Institute, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
9
|
Ahn S, Datta S. PRANA: an R package for differential co-expression network analysis with the presence of additional covariates. BMC Genomics 2023; 24:687. [PMID: 37974076 PMCID: PMC10652545 DOI: 10.1186/s12864-023-09787-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023] Open
Abstract
BACKGROUND Advances in sequencing technology and cost reduction have enabled an emergence of various statistical methods used in RNA-sequencing data, including the differential co-expression network analysis (or differential network analysis). A key benefit of this method is that it takes into consideration the interactions between or among genes and do not require an established knowledge in biological pathways. As of now, none of existing softwares can incorporate covariates that should be adjusted if they are confounding factors while performing the differential network analysis. RESULTS We develop an R package PRANA which a user can easily include multiple covariates. The main R function in this package leverages a novel pseudo-value regression approach for a differential network analysis in RNA-sequencing data. This software is also enclosed with complementary R functions for extracting adjusted p-values and coefficient estimates of all or specific variable for each gene, as well as for identifying the names of genes that are differentially connected (DC, hereafter) between subjects under biologically different conditions from the output. CONCLUSION Herewith, we demonstrate the application of this package in a real data on chronic obstructive pulmonary disease. PRANA is available through the CRAN repositories under the GPL-3 license: https://cran.r-project.org/web/packages/PRANA/index.html .
Collapse
Affiliation(s)
- Seungjun Ahn
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, USA.
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, USA.
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, USA.
| | - Somnath Datta
- Department of Biostatistics, University of Florida, Gainesville, USA
| |
Collapse
|
10
|
Niu Y, Ni Y, Pati D, Mallick BK. Covariate-Assisted Bayesian Graph Learning for Heterogeneous Data. J Am Stat Assoc 2023; 119:1985-1999. [PMID: 39507103 PMCID: PMC11536292 DOI: 10.1080/01621459.2023.2233744] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 06/01/2023] [Accepted: 06/25/2023] [Indexed: 11/08/2024]
Abstract
In a traditional Gaussian graphical model, data homogeneity is routinely assumed with no extra variables affecting the conditional independence. In modern genomic datasets, there is an abundance of auxiliary information, which often gets under-utilized in determining the joint dependency structure. In this article, we consider a Bayesian approach to model undirected graphs underlying heterogeneous multivariate observations with additional assistance from covariates. Building on product partition models, we propose a novel covariate-dependent Gaussian graphical model that allows graphs to vary with covariates so that observations whose covariates are similar share a similar undirected graph. To efficiently embed Gaussian graphical models into our proposed framework, we explore both Gaussian likelihood and pseudo-likelihood functions. For Gaussian likelihood, a G-Wishart distribution is used as a natural conjugate prior, and for the pseudo-likelihood, a product of Gaussianconditionals is used. Moreover, the proposed model has large prior support and is flexible to approximate any v-Hölder conditional variance-covariance matrices with v ∈ ( 0,1 ] . We further show that based on the theory of fractional likelihood, the rate of posterior contraction is minimax optimal assuming the true density to be a Gaussian mixture with a known number of components. The efficacy of the approach is demonstrated via simulation studies and an analysis of a protein network for a breast cancer dataset assisted by mRNA gene expression as covariates.
Collapse
Affiliation(s)
- Yabo Niu
- Department of Mathematics, University of Houston
| | - Yang Ni
- Department of Statistics, Texas A&M University
| | | | | |
Collapse
|
11
|
Liu Y, Darville T, Zheng X, Li Q. Decomposition of variation of mixed variables by a latent mixed Gaussian copula model. Biometrics 2023; 79:1187-1200. [PMID: 35304917 PMCID: PMC10019899 DOI: 10.1111/biom.13660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 03/03/2022] [Indexed: 11/27/2022]
Abstract
Many biomedical studies collect data of mixed types of variables from multiple groups of subjects. Some of these studies aim to find the group-specific and the common variation among all these variables. Even though similar problems have been studied by some previous works, their methods mainly rely on the Pearson correlation, which cannot handle mixed data. To address this issue, we propose a latent mixed Gaussian copula (LMGC) model that can quantify the correlations among binary, ordinal, continuous, and truncated variables in a unified framework. We also provide a tool to decompose the variation into the group-specific and the common variation over multiple groups via solving a regularized M-estimation problem. We conduct extensive simulation studies to show the advantage of our proposed method over the Pearson correlation-based methods. We also demonstrate that by jointly solving the M-estimation problem over multiple groups, our method is better than decomposing the variation group by group. We also apply our method to a Chlamydia trachomatis genital tract infection study to demonstrate how it can be used to discover informative biomarkers that differentiate patients.
Collapse
Affiliation(s)
- Yutong Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Toni Darville
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Xiaojing Zheng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Quefeng Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
12
|
Karaaslanli A, Saha S, Maiti T, Aviyente S. Kernelized multiview signed graph learning for single-cell RNA sequencing data. BMC Bioinformatics 2023; 24:127. [PMID: 37016281 PMCID: PMC10071725 DOI: 10.1186/s12859-023-05250-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Accepted: 03/22/2023] [Indexed: 04/06/2023] Open
Abstract
BACKGROUND Characterizing the topology of gene regulatory networks (GRNs) is a fundamental problem in systems biology. The advent of single cell technologies has made it possible to construct GRNs at finer resolutions than bulk and microarray datasets. However, cellular heterogeneity and sparsity of the single cell datasets render void the application of regular Gaussian assumptions for constructing GRNs. Additionally, most GRN reconstruction approaches estimate a single network for the entire data. This could cause potential loss of information when single cell datasets are generated from multiple treatment conditions/disease states. RESULTS To better characterize single cell GRNs under different but related conditions, we propose the joint estimation of multiple networks using multiple signed graph learning (scMSGL). The proposed method is based on recently developed graph signal processing (GSP) based graph learning, where GRNs and gene expressions are modeled as signed graphs and graph signals, respectively. scMSGL learns multiple GRNs by optimizing the total variation of gene expressions with respect to GRNs while ensuring that the learned GRNs are similar to each other through regularization with respect to a learned signed consensus graph. We further kernelize scMSGL with the kernel selected to suit the structure of single cell data. CONCLUSIONS scMSGL is shown to have superior performance over existing state of the art methods in GRN recovery on simulated datasets. Furthermore, scMSGL successfully identifies well-established regulators in a mouse embryonic stem cell differentiation study and a cancer clinical study of medulloblastoma.
Collapse
Affiliation(s)
- Abdullah Karaaslanli
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, USA.
| | - Satabdi Saha
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tapabrata Maiti
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA
| | - Selin Aviyente
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
13
|
Becker M, Nassar H, Espinosa C, Stelzer IA, Feyaerts D, Berson E, Bidoki NH, Chang AL, Saarunya G, Culos A, De Francesco D, Fallahzadeh R, Liu Q, Kim Y, Marić I, Mataraso SJ, Payrovnaziri SN, Phongpreecha T, Ravindra NG, Stanley N, Shome S, Tan Y, Thuraiappah M, Xenochristou M, Xue L, Shaw G, Stevenson D, Angst MS, Gaudilliere B, Aghaeepour N. Large-scale correlation network construction for unraveling the coordination of complex biological systems. NATURE COMPUTATIONAL SCIENCE 2023; 3:346-359. [PMID: 38116462 PMCID: PMC10727505 DOI: 10.1038/s43588-023-00429-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 03/10/2023] [Indexed: 12/21/2023]
Abstract
Advanced measurement and data storage technologies have enabled high-dimensional profiling of complex biological systems. For this, modern multiomics studies regularly produce datasets with hundreds of thousands of measurements per sample, enabling a new era of precision medicine. Correlation analysis is an important first step to gain deeper insights into the coordination and underlying processes of such complex systems. However, the construction of large correlation networks in modern high-dimensional datasets remains a major computational challenge owing to rapidly growing runtime and memory requirements. Here we address this challenge by introducing CorALS (Correlation Analysis of Large-scale (biological) Systems), an open-source framework for the construction and analysis of large-scale parametric as well as non-parametric correlation networks for high-dimensional biological data. It features off-the-shelf algorithms suitable for both personal and high-performance computers, enabling workflows and downstream analysis approaches. We illustrate the broad scope and potential of CorALS by exploring perspectives on complex biological processes in large-scale multiomics and single-cell studies.
Collapse
Affiliation(s)
- Martin Becker
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Computer Science and Electrical Engineering, University of Rostock, Rostock, Germany
- These authors contributed equally: Martin Becker, Huda Nassar, Camilo Espinosa
| | - Huda Nassar
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
- These authors contributed equally: Martin Becker, Huda Nassar, Camilo Espinosa
| | - Camilo Espinosa
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
- These authors contributed equally: Martin Becker, Huda Nassar, Camilo Espinosa
| | - Ina A. Stelzer
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Dorien Feyaerts
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Eloise Berson
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Neda H. Bidoki
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Alan L. Chang
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Geetha Saarunya
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Anthony Culos
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Davide De Francesco
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Ramin Fallahzadeh
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Qun Liu
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Yeasul Kim
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Ivana Marić
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Samson J. Mataraso
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Seyedeh Neelufar Payrovnaziri
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Thanaphong Phongpreecha
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pathology, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Neal G. Ravindra
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Natalie Stanley
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Sayane Shome
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Yuqi Tan
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Melan Thuraiappah
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Maria Xenochristou
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Lei Xue
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Gary Shaw
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
| | - David Stevenson
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Martin S. Angst
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Brice Gaudilliere
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University School of Medicine, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, CA, USA
| |
Collapse
|
14
|
Ahn S, Grimes T, Datta S. A pseudo-value regression approach for differential network analysis of co-expression data. BMC Bioinformatics 2023; 24:8. [PMID: 36624383 PMCID: PMC9830718 DOI: 10.1186/s12859-022-05123-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Accepted: 12/22/2022] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND The differential network (DN) analysis identifies changes in measures of association among genes under two or more experimental conditions. In this article, we introduce a pseudo-value regression approach for network analysis (PRANA). This is a novel method of differential network analysis that also adjusts for additional clinical covariates. We start from mutual information criteria, followed by pseudo-value calculations, which are then entered into a robust regression model. RESULTS This article assesses the model performances of PRANA in a multivariable setting, followed by a comparison to dnapath and DINGO in both univariable and multivariable settings through variety of simulations. Performance in terms of precision, recall, and F1 score of differentially connected (DC) genes is assessed. By and large, PRANA outperformed dnapath and DINGO, neither of which is equipped to adjust for available covariates such as patient-age. Lastly, we employ PRANA in a real data application from the Gene Expression Omnibus database to identify DC genes that are associated with chronic obstructive pulmonary disease to demonstrate its utility. CONCLUSION To the best of our knowledge, this is the first attempt of utilizing a regression modeling for DN analysis by collective gene expression levels between two or more groups with the inclusion of additional clinical covariates. By and large, adjusting for available covariates improves accuracy of a DN analysis.
Collapse
Affiliation(s)
- Seungjun Ahn
- Department of Biostatistics, University of Florida, Gainesville, USA
| | - Tyler Grimes
- Department of Mathematics and Statistics, University of North Florida, Jacksonville, USA
| | - Somnath Datta
- Department of Biostatistics, University of Florida, Gainesville, USA.
| |
Collapse
|
15
|
Seal S, Li Q, Basner EB, Saba LM, Kechris K. RCFGL: Rapid Condition adaptive Fused Graphical Lasso and application to modeling brain region co-expression networks. PLoS Comput Biol 2023; 19:e1010758. [PMID: 36607897 PMCID: PMC9821764 DOI: 10.1371/journal.pcbi.1010758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 11/24/2022] [Indexed: 01/07/2023] Open
Abstract
Inferring gene co-expression networks is a useful process for understanding gene regulation and pathway activity. The networks are usually undirected graphs where genes are represented as nodes and an edge represents a significant co-expression relationship. When expression data of multiple (p) genes in multiple (K) conditions (e.g., treatments, tissues, strains) are available, joint estimation of networks harnessing shared information across them can significantly increase the power of analysis. In addition, examining condition-specific patterns of co-expression can provide insights into the underlying cellular processes activated in a particular condition. Condition adaptive fused graphical lasso (CFGL) is an existing method that incorporates condition specificity in a fused graphical lasso (FGL) model for estimating multiple co-expression networks. However, with computational complexity of O(p2K log K), the current implementation of CFGL is prohibitively slow even for a moderate number of genes and can only be used for a maximum of three conditions. In this paper, we propose a faster alternative of CFGL named rapid condition adaptive fused graphical lasso (RCFGL). In RCFGL, we incorporate the condition specificity into another popular model for joint network estimation, known as fused multiple graphical lasso (FMGL). We use a more efficient algorithm in the iterative steps compared to CFGL, enabling faster computation with complexity of O(p2K) and making it easily generalizable for more than three conditions. We also present a novel screening rule to determine if the full network estimation problem can be broken down into estimation of smaller disjoint sub-networks, thereby reducing the complexity further. We demonstrate the computational advantage and superior performance of our method compared to two non-condition adaptive methods, FGL and FMGL, and one condition adaptive method, CFGL in both simulation study and real data analysis. We used RCFGL to jointly estimate the gene co-expression networks in different brain regions (conditions) using a cohort of heterogeneous stock rats. We also provide an accommodating C and Python based package that implements RCFGL.
Collapse
Affiliation(s)
- Souvik Seal
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Qunhua Li
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Elle Butler Basner
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Laura M. Saba
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| |
Collapse
|
16
|
Acharyya S, Zhou X, Baladandayuthapani V. SpaceX: gene co-expression network estimation for spatial transcriptomics. Bioinformatics 2022; 38:5033-5041. [PMID: 36179087 PMCID: PMC9665869 DOI: 10.1093/bioinformatics/btac645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 08/27/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The analysis of spatially resolved transcriptome enables the understanding of the spatial interactions between the cellular environment and transcriptional regulation. In particular, the characterization of the gene-gene co-expression at distinct spatial locations or cell types in the tissue enables delineation of spatial co-regulatory patterns as opposed to standard differential single gene analyses. To enhance the ability and potential of spatial transcriptomics technologies to drive biological discovery, we develop a statistical framework to detect gene co-expression patterns in a spatially structured tissue consisting of different clusters in the form of cell classes or tissue domains. RESULTS We develop SpaceX (spatially dependent gene co-expression network), a Bayesian methodology to identify both shared and cluster-specific co-expression network across genes. SpaceX uses an over-dispersed spatial Poisson model coupled with a high-dimensional factor model which is based on a dimension reduction technique for computational efficiency. We show via simulations, accuracy gains in co-expression network estimation and structure by accounting for (increasing) spatial correlation and appropriate noise distributions. In-depth analysis of two spatial transcriptomics datasets in mouse hypothalamus and human breast cancer using SpaceX, detected multiple hub genes which are related to cognitive abilities for the hypothalamus data and multiple cancer genes (e.g. collagen family) from the tumor region for the breast cancer data. AVAILABILITY AND IMPLEMENTATION The SpaceX R-package is available at github.com/bayesrx/SpaceX. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Satwik Acharyya
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | | |
Collapse
|
17
|
Agamah FE, Bayjanov JR, Niehues A, Njoku KF, Skelton M, Mazandu GK, Ederveen THA, Mulder N, Chimusa ER, 't Hoen PAC. Computational approaches for network-based integrative multi-omics analysis. Front Mol Biosci 2022; 9:967205. [PMID: 36452456 PMCID: PMC9703081 DOI: 10.3389/fmolb.2022.967205] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 10/20/2022] [Indexed: 08/27/2023] Open
Abstract
Advances in omics technologies allow for holistic studies into biological systems. These studies rely on integrative data analysis techniques to obtain a comprehensive view of the dynamics of cellular processes, and molecular mechanisms. Network-based integrative approaches have revolutionized multi-omics analysis by providing the framework to represent interactions between multiple different omics-layers in a graph, which may faithfully reflect the molecular wiring in a cell. Here we review network-based multi-omics/multi-modal integrative analytical approaches. We classify these approaches according to the type of omics data supported, the methods and/or algorithms implemented, their node and/or edge weighting components, and their ability to identify key nodes and subnetworks. We show how these approaches can be used to identify biomarkers, disease subtypes, crosstalk, causality, and molecular drivers of physiological and pathological mechanisms. We provide insight into the most appropriate methods and tools for research questions as showcased around the aetiology and treatment of COVID-19 that can be informed by multi-omics data integration. We conclude with an overview of challenges associated with multi-omics network-based analysis, such as reproducibility, heterogeneity, (biological) interpretability of the results, and we highlight some future directions for network-based integration.
Collapse
Affiliation(s)
- Francis E. Agamah
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jumamurat R. Bayjanov
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Anna Niehues
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Kelechi F. Njoku
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Michelle Skelton
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Gaston K. Mazandu
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- African Institute for Mathematical Sciences, Cape Town, South Africa
| | - Thomas H. A. Ederveen
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Emile R. Chimusa
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle, United Kingdom
| | - Peter A. C. 't Hoen
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| |
Collapse
|
18
|
Shutta KH, De Vito R, Scholtens DM, Balasubramanian R. Gaussian graphical models with applications to omics analyses. Stat Med 2022; 41:5150-5187. [PMID: 36161666 PMCID: PMC9672860 DOI: 10.1002/sim.9546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 06/06/2022] [Accepted: 07/21/2022] [Indexed: 11/06/2022]
Abstract
Gaussian graphical models (GGMs) provide a framework for modeling conditional dependencies in multivariate data. In this tutorial, we provide an overview of GGM theory and a demonstration of various GGM tools in R. The mathematical foundations of GGMs are introduced with the goal of enabling the researcher to draw practical conclusions by interpreting model results. Background literature is presented, emphasizing methods recently developed for high-dimensional applications such as genomics, proteomics, or metabolomics. The application of these methods is illustrated using a publicly available dataset of gene expression profiles from 578 participants with ovarian cancer in The Cancer Genome Atlas. Stand-alone code for the demonstration is available as an RMarkdown file at https://github.com/katehoffshutta/ggmTutorial.
Collapse
Affiliation(s)
- Katherine H. Shutta
- Department of Biostatistics and Epidemiology, University of Massachusetts - Amherst, Amherst, Massachusetts, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Roberta De Vito
- Department of Biostatistics and Data Science Initiative, Brown University, Providence, Rhode Island, USA
| | - Denise M. Scholtens
- Division of Biostatistics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Raji Balasubramanian
- Department of Biostatistics and Epidemiology, University of Massachusetts - Amherst, Amherst, Massachusetts, USA
| |
Collapse
|
19
|
Huang YJ, Mukherjee R, Hsiao CK. Probabilistic edge inference of gene networks with markov random field-based bayesian learning. Front Genet 2022; 13:1034946. [DOI: 10.3389/fgene.2022.1034946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 10/24/2022] [Indexed: 11/11/2022] Open
Abstract
Current algorithms for gene regulatory network construction based on Gaussian graphical models focuses on the deterministic decision of whether an edge exists. Both the probabilistic inference of edge existence and the relative strength of edges are often overlooked, either because the computational algorithms cannot account for this uncertainty or because it is not straightforward in implementation. In this study, we combine the Bayesian Markov random field and the conditional autoregressive (CAR) model to tackle simultaneously these two tasks. The uncertainty of edge existence and the relative strength of edges can be measured and quantified based on a Bayesian model such as the CAR model and the spike-and-slab lasso prior. In addition, the strength of the edges can be utilized to prioritize the importance of the edges in a network graph. Simulations and a glioblastoma cancer study were carried out to assess the proposed model’s performance and to compare it with existing methods when a binary decision is of interest. The proposed approach shows stable performance and may provide novel structures with biological insights.
Collapse
|
20
|
Ni Y, He J, Chalise P. Integration of differential expression and network structure for 'omics data analysis. Comput Biol Med 2022; 150:106133. [PMID: 36179515 DOI: 10.1016/j.compbiomed.2022.106133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 08/23/2022] [Accepted: 09/18/2022] [Indexed: 11/25/2022]
Abstract
Differential expression (DE) analysis has been routinely used to identify molecular features that are statistically significantly different between distinct biological groups. In recent years, differential network (DN) analysis has emerged as a powerful approach to uncover molecular network structure changes from one biological condition to the other where the molecular features with larger topological changes are selected as biomarkers. Although a large number of DE and a few DN-based methods are available, they have been usually implemented independently. DE analysis ignores the relationship among molecular features while DN analysis does not account for the expression changes at individual level. Therefore, an integrative analysis approach that accounts for both DE and DN is required to identify disease associated key features. Although, a handful of methods have been proposed, there is no method that optimizes the combination of DE and DN. We propose a novel integrative analysis method, DNrank, to identify disease-associated molecular features that leverages the strengths of both DE and DN by calculating a weight using resampling based cross validation scheme within the algorithm. First, differential expression analysis of individual molecular features is carried out. Second, a differential network structure is constructed using the differential partial correlation analysis. Third, the molecular features are ranked in the order of their significances by integrating their DE measures and DN structure using the modified Google's PageRank algorithm. In the algorithm, the optimum combination of DE and DN analyses is achieved by evaluating the prediction performance of top-ranked features utilizing support vector machine classifier with Monte Carlo cross validation. The proposed method is illustrated using both simulated data and three real data sets. The results show that the proposed method has a better performance in identifying important molecular features with respect to predictive discrimination. Also, as compared to existing feature selection methods, the top-ranked features selected by our method had a higher stability in selection. DNrank allows the researchers to identify the disease-associated features by utilizing both expression and network topology changes between two groups.
Collapse
Affiliation(s)
- Yonghui Ni
- Department of Biostatistics and Data Science, University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City, KS, 66160, USA
| | - Jianghua He
- Department of Biostatistics and Data Science, University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City, KS, 66160, USA
| | - Prabhakar Chalise
- Department of Biostatistics and Data Science, University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City, KS, 66160, USA.
| |
Collapse
|
21
|
Xue S, Rogers LR, Zheng M, He J, Piermarocchi C, Mias GI. Applying differential network analysis to longitudinal gene expression in response to perturbations. Front Genet 2022; 13:1026487. [PMID: 36324501 PMCID: PMC9618823 DOI: 10.3389/fgene.2022.1026487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 10/03/2022] [Indexed: 11/17/2022] Open
Abstract
Differential Network (DN) analysis is a method that has long been used to interpret changes in gene expression data and provide biological insights. The method identifies the rewiring of gene networks in response to external perturbations. Our study applies the DN method to the analysis of RNA-sequencing (RNA-seq) time series datasets. We focus on expression changes: (i) in saliva of a human subject after pneumococcal vaccination (PPSV23) and (ii) in primary B cells treated ex vivo with a monoclonal antibody drug (Rituximab). The DN method enabled us to identify the activation of biological pathways consistent with the mechanisms of action of the PPSV23 vaccine and target pathways of Rituximab. The community detection algorithm on the DN revealed clusters of genes characterized by collective temporal behavior. All saliva and some B cell DN communities showed characteristic time signatures, outlining a chronological order in pathway activation in response to the perturbation. Moreover, we identified early and delayed responses within network modules in the saliva dataset and three temporal patterns in the B cell data.
Collapse
Affiliation(s)
- Shuyue Xue
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI, United States
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, United States
| | - Lavida R.K. Rogers
- Department of Biological Sciences, University of the Virgin Islands, St Thomas, US Virgin Islands
| | - Minzhang Zheng
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, United States
| | - Jin He
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, United States
| | - Carlo Piermarocchi
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI, United States
| | - George I. Mias
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI, United States
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, United States
| |
Collapse
|
22
|
Wang P, Wang D. Gene Differential Co-Expression Networks Based on RNA-Seq: Construction and Its Applications. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2829-2841. [PMID: 34383649 DOI: 10.1109/tcbb.2021.3103280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Gene co-expression network (GCN) becomes an increasingly important tool in omics data analysis. A great challenge for GCN construction is that the sample size is far lower than the number of genes. Traditional methods rely on considerable samples. Moreover, association signals are likely weak, nonlinear and stochastic, which are difficult to be identified among thousands of candidates. In this paper, the gray correlation coefficient (GCC) is introduced, and a novel method to construct gene differential co-expression networks (GDCNs) is proposed. Based on the GDCNs, three measures are proposed to explore informative genes. The proposed method can make full use of the information provided by a handful of samples and overcome the shortages of GCNs, which can evaluate the changes of co-expression relationships that are possibly triggered by treatments. Based on RNA-seq data of Brassica napus, GDCNs under multiple experimental conditions are constructed and investigated. It is found that the GCC-based method is very robust to data processing. The GDCNs facilitate the inference of gene functions and the identification of informative genes that are responsible for stress responsiveness. The GDCN-based approaches integrate the 'guilt by association' and the 'guilt by rewiring' rules, which provide alternative tools for omics data analysis.
Collapse
|
23
|
Al-Kuhali HA, Shan M, Hael MA, Al-Hada EA, Al-Murisi SA, Al-Kuhali AA, Aldaifl AAQ, Amin ME. Multiview clustering of multi-omics data integration by using a penalty model. BMC Bioinformatics 2022; 23:288. [PMID: 35864439 PMCID: PMC9306064 DOI: 10.1186/s12859-022-04826-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 06/20/2022] [Indexed: 11/10/2022] Open
Abstract
Background Methods for the multiview clustering and integration of multi-omics data have been developed recently to solve problems caused by data noise or limited sample size and to integrate multi-omics data with consistent (common) and differential cluster patterns. However, the integration of such data still suffers from limited performance and low accuracy. Results In this study, a computational framework for the multiview clustering method based on the penalty model is presented to overcome the challenges of low accuracy and limited performance in the case of integrating multi-omics data with consistent (common) and differential cluster patterns. The performance of the proposed method was evaluated on synthetic data and four real multi-omics data and then compared with approaches presented in the literature under different scenarios. Result implies that our method exhibits competitive performance compared with recently developed techniques when the underlying clusters are consistent with synthetic data. In the case of the differential clusters, the proposed method also presents an enhanced performance. In addition, with regards to real omics data, the developed method exhibits better performance, demonstrating its ability to provide more detailed information within each data type and working better to integrate multi-omics data with consistent (common) and differential cluster patterns. This study shows that the proposed method offers more significant differences in survival times across all types of cancer. Conclusions A new multiview clustering method is proposed in this study based on synthetic and real data. This method performs better than other techniques previously presented in the literature in terms of integrating multi-omics data with consistent and differential cluster patterns and determining the significance of difference in survival times.
Collapse
Affiliation(s)
- Hamas A Al-Kuhali
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, China
| | - Ma Shan
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, China.
| | | | - Eman A Al-Hada
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, China
| | | | | | - Ammar A Q Aldaifl
- School of Information Engineering, Wuhan University of Technology, Wuhan, China
| | - Mohammed Elmustafa Amin
- Department of Mathematics, Faculty of Science and Technology, Omdurman Islamic University, Khartoum, Sudan
| |
Collapse
|
24
|
Balasubramanian R, Hu J, Guasch-Ferre M, Li J, Sorond F, Zhao Y, Shutta KH, Salas-Salvado J, Hu F, Clish CB, Rexrode KM. Metabolomic Profiles Associated With Incident Ischemic Stroke. Neurology 2022; 98:e483-e492. [PMID: 34853177 PMCID: PMC8826464 DOI: 10.1212/wnl.0000000000013129] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 11/16/2021] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND AND OBJECTIVES Women have higher lifetime risk of stroke than men, and metabolic factors seem more strongly associated with stroke for women than men. However, few studies in either men or women have evaluated metabolomic profiles and incident stroke. METHODS We applied liquid chromatography-tandem mass spectrometry to measure 519 plasma metabolites in a discovery set of women in the Nurses' Health Study (NHS; 454 incident ischemic stroke cases, 454 controls) with validation in 2 independent, prospective cohorts: Prevención con Dieta Mediterránea (PREDIMED; 118 stroke cases, 791 controls) and Nurses' Health Study 2 (NHS2; 49 ischemic stroke cases, 49 controls). We applied logistic regression models with stroke as the outcome to adjust for multiple risk factors; the false discovery rate was controlled through the q value method. RESULTS Twenty-three metabolites were significantly associated with incident stroke in NHS after adjustment for traditional risk factors (q < 0.05). Of these, 14 metabolites were available in PREDIMED and 3 were significantly associated with incident stroke: methionine sulfoxide, N6-acetyllysine, and sucrose (q < 0.05). In NHS2, one of the 23 metabolites (glucuronate) was significantly associated with incident stroke (q < 0.05). For all 4 metabolites, higher levels were associated with increased risk. These 4 metabolites were used to create a stroke metabolite score (SMS) in the NHS and tested in PREDIMED. Per unit of standard deviation of SMS, the odds ratio for incident stroke was 4.12 (95% confidence interval [CI] 2.26-7.51) in PREDIMED, after adjustment for risk factors. In PREDIMED, the area under the receiver operating characteristic curve (AUC) for the model including SMS and traditional risk factors was 0.70 (95% CI 0.75-0.79) vs the AUC for the model including the traditional risk factors only of 0.65 (95% CI 0.70-0.75), corresponding to a 5% improvement in risk prediction with SMS (p < 0.005). DISCUSSION Metabolites associated with stroke included 2 amino acids, a carboxylic acid, and sucrose. A composite SMS including these metabolites was associated with ischemic stroke and showed improvement in risk prediction beyond traditional risk factors. CLASSIFICATION OF EVIDENCE This study provides Class II evidence that a SMS accurately predicts incident ischemic stroke risk.
Collapse
Affiliation(s)
- Raji Balasubramanian
- From the Department of Biostatistics and Epidemiology (R.B., Y.Z., K.H.S.), University of Massachusetts-Amherst; Division of Women's Health (J.H., K.M.R.) and Channing Division of Network Medicine, Department of Medicine (M.G.-F., F.H.), Brigham and Women's Hospital, Harvard Medical School; Departments of Nutrition (M.G.-F., J.L., F.H.) and Epidemiology (J.L., F.H.), Harvard T.H. Chan School of Public Health, Boston, MA; Davee Department of Neurology, Division of Stroke and Neurocritical Care (F.S.), Northwestern Feinberg School of Medicine, Chicago, IL; Departament de Bioquímica i Biotecnologia, Unitat de Nutrició (J.S.S.), Universitat Rovira i Virgili, Reus; Centro de Investigación Biomédica en Red Fisiopatología de la Obesidad y la Nutrición (CIBEROBN) (J.S.-S.), Institute of Health Carlos III, Madrid; Nutrition Unit, Pere Virgili Research Institute (IISPV) (J.S.-S.), University Hospital of Sant Joan de Reus, Spain; and Broad Institute of the Massachusetts Institute of Technology and Harvard University (C.B.C.), Cambridge.
| | - Jie Hu
- From the Department of Biostatistics and Epidemiology (R.B., Y.Z., K.H.S.), University of Massachusetts-Amherst; Division of Women's Health (J.H., K.M.R.) and Channing Division of Network Medicine, Department of Medicine (M.G.-F., F.H.), Brigham and Women's Hospital, Harvard Medical School; Departments of Nutrition (M.G.-F., J.L., F.H.) and Epidemiology (J.L., F.H.), Harvard T.H. Chan School of Public Health, Boston, MA; Davee Department of Neurology, Division of Stroke and Neurocritical Care (F.S.), Northwestern Feinberg School of Medicine, Chicago, IL; Departament de Bioquímica i Biotecnologia, Unitat de Nutrició (J.S.S.), Universitat Rovira i Virgili, Reus; Centro de Investigación Biomédica en Red Fisiopatología de la Obesidad y la Nutrición (CIBEROBN) (J.S.-S.), Institute of Health Carlos III, Madrid; Nutrition Unit, Pere Virgili Research Institute (IISPV) (J.S.-S.), University Hospital of Sant Joan de Reus, Spain; and Broad Institute of the Massachusetts Institute of Technology and Harvard University (C.B.C.), Cambridge
| | - Marta Guasch-Ferre
- From the Department of Biostatistics and Epidemiology (R.B., Y.Z., K.H.S.), University of Massachusetts-Amherst; Division of Women's Health (J.H., K.M.R.) and Channing Division of Network Medicine, Department of Medicine (M.G.-F., F.H.), Brigham and Women's Hospital, Harvard Medical School; Departments of Nutrition (M.G.-F., J.L., F.H.) and Epidemiology (J.L., F.H.), Harvard T.H. Chan School of Public Health, Boston, MA; Davee Department of Neurology, Division of Stroke and Neurocritical Care (F.S.), Northwestern Feinberg School of Medicine, Chicago, IL; Departament de Bioquímica i Biotecnologia, Unitat de Nutrició (J.S.S.), Universitat Rovira i Virgili, Reus; Centro de Investigación Biomédica en Red Fisiopatología de la Obesidad y la Nutrición (CIBEROBN) (J.S.-S.), Institute of Health Carlos III, Madrid; Nutrition Unit, Pere Virgili Research Institute (IISPV) (J.S.-S.), University Hospital of Sant Joan de Reus, Spain; and Broad Institute of the Massachusetts Institute of Technology and Harvard University (C.B.C.), Cambridge
| | - Jun Li
- From the Department of Biostatistics and Epidemiology (R.B., Y.Z., K.H.S.), University of Massachusetts-Amherst; Division of Women's Health (J.H., K.M.R.) and Channing Division of Network Medicine, Department of Medicine (M.G.-F., F.H.), Brigham and Women's Hospital, Harvard Medical School; Departments of Nutrition (M.G.-F., J.L., F.H.) and Epidemiology (J.L., F.H.), Harvard T.H. Chan School of Public Health, Boston, MA; Davee Department of Neurology, Division of Stroke and Neurocritical Care (F.S.), Northwestern Feinberg School of Medicine, Chicago, IL; Departament de Bioquímica i Biotecnologia, Unitat de Nutrició (J.S.S.), Universitat Rovira i Virgili, Reus; Centro de Investigación Biomédica en Red Fisiopatología de la Obesidad y la Nutrición (CIBEROBN) (J.S.-S.), Institute of Health Carlos III, Madrid; Nutrition Unit, Pere Virgili Research Institute (IISPV) (J.S.-S.), University Hospital of Sant Joan de Reus, Spain; and Broad Institute of the Massachusetts Institute of Technology and Harvard University (C.B.C.), Cambridge
| | - Farzaneh Sorond
- From the Department of Biostatistics and Epidemiology (R.B., Y.Z., K.H.S.), University of Massachusetts-Amherst; Division of Women's Health (J.H., K.M.R.) and Channing Division of Network Medicine, Department of Medicine (M.G.-F., F.H.), Brigham and Women's Hospital, Harvard Medical School; Departments of Nutrition (M.G.-F., J.L., F.H.) and Epidemiology (J.L., F.H.), Harvard T.H. Chan School of Public Health, Boston, MA; Davee Department of Neurology, Division of Stroke and Neurocritical Care (F.S.), Northwestern Feinberg School of Medicine, Chicago, IL; Departament de Bioquímica i Biotecnologia, Unitat de Nutrició (J.S.S.), Universitat Rovira i Virgili, Reus; Centro de Investigación Biomédica en Red Fisiopatología de la Obesidad y la Nutrición (CIBEROBN) (J.S.-S.), Institute of Health Carlos III, Madrid; Nutrition Unit, Pere Virgili Research Institute (IISPV) (J.S.-S.), University Hospital of Sant Joan de Reus, Spain; and Broad Institute of the Massachusetts Institute of Technology and Harvard University (C.B.C.), Cambridge
| | - Yibai Zhao
- From the Department of Biostatistics and Epidemiology (R.B., Y.Z., K.H.S.), University of Massachusetts-Amherst; Division of Women's Health (J.H., K.M.R.) and Channing Division of Network Medicine, Department of Medicine (M.G.-F., F.H.), Brigham and Women's Hospital, Harvard Medical School; Departments of Nutrition (M.G.-F., J.L., F.H.) and Epidemiology (J.L., F.H.), Harvard T.H. Chan School of Public Health, Boston, MA; Davee Department of Neurology, Division of Stroke and Neurocritical Care (F.S.), Northwestern Feinberg School of Medicine, Chicago, IL; Departament de Bioquímica i Biotecnologia, Unitat de Nutrició (J.S.S.), Universitat Rovira i Virgili, Reus; Centro de Investigación Biomédica en Red Fisiopatología de la Obesidad y la Nutrición (CIBEROBN) (J.S.-S.), Institute of Health Carlos III, Madrid; Nutrition Unit, Pere Virgili Research Institute (IISPV) (J.S.-S.), University Hospital of Sant Joan de Reus, Spain; and Broad Institute of the Massachusetts Institute of Technology and Harvard University (C.B.C.), Cambridge
| | - Katherine H Shutta
- From the Department of Biostatistics and Epidemiology (R.B., Y.Z., K.H.S.), University of Massachusetts-Amherst; Division of Women's Health (J.H., K.M.R.) and Channing Division of Network Medicine, Department of Medicine (M.G.-F., F.H.), Brigham and Women's Hospital, Harvard Medical School; Departments of Nutrition (M.G.-F., J.L., F.H.) and Epidemiology (J.L., F.H.), Harvard T.H. Chan School of Public Health, Boston, MA; Davee Department of Neurology, Division of Stroke and Neurocritical Care (F.S.), Northwestern Feinberg School of Medicine, Chicago, IL; Departament de Bioquímica i Biotecnologia, Unitat de Nutrició (J.S.S.), Universitat Rovira i Virgili, Reus; Centro de Investigación Biomédica en Red Fisiopatología de la Obesidad y la Nutrición (CIBEROBN) (J.S.-S.), Institute of Health Carlos III, Madrid; Nutrition Unit, Pere Virgili Research Institute (IISPV) (J.S.-S.), University Hospital of Sant Joan de Reus, Spain; and Broad Institute of the Massachusetts Institute of Technology and Harvard University (C.B.C.), Cambridge
| | - Jordi Salas-Salvado
- From the Department of Biostatistics and Epidemiology (R.B., Y.Z., K.H.S.), University of Massachusetts-Amherst; Division of Women's Health (J.H., K.M.R.) and Channing Division of Network Medicine, Department of Medicine (M.G.-F., F.H.), Brigham and Women's Hospital, Harvard Medical School; Departments of Nutrition (M.G.-F., J.L., F.H.) and Epidemiology (J.L., F.H.), Harvard T.H. Chan School of Public Health, Boston, MA; Davee Department of Neurology, Division of Stroke and Neurocritical Care (F.S.), Northwestern Feinberg School of Medicine, Chicago, IL; Departament de Bioquímica i Biotecnologia, Unitat de Nutrició (J.S.S.), Universitat Rovira i Virgili, Reus; Centro de Investigación Biomédica en Red Fisiopatología de la Obesidad y la Nutrición (CIBEROBN) (J.S.-S.), Institute of Health Carlos III, Madrid; Nutrition Unit, Pere Virgili Research Institute (IISPV) (J.S.-S.), University Hospital of Sant Joan de Reus, Spain; and Broad Institute of the Massachusetts Institute of Technology and Harvard University (C.B.C.), Cambridge
| | - Frank Hu
- From the Department of Biostatistics and Epidemiology (R.B., Y.Z., K.H.S.), University of Massachusetts-Amherst; Division of Women's Health (J.H., K.M.R.) and Channing Division of Network Medicine, Department of Medicine (M.G.-F., F.H.), Brigham and Women's Hospital, Harvard Medical School; Departments of Nutrition (M.G.-F., J.L., F.H.) and Epidemiology (J.L., F.H.), Harvard T.H. Chan School of Public Health, Boston, MA; Davee Department of Neurology, Division of Stroke and Neurocritical Care (F.S.), Northwestern Feinberg School of Medicine, Chicago, IL; Departament de Bioquímica i Biotecnologia, Unitat de Nutrició (J.S.S.), Universitat Rovira i Virgili, Reus; Centro de Investigación Biomédica en Red Fisiopatología de la Obesidad y la Nutrición (CIBEROBN) (J.S.-S.), Institute of Health Carlos III, Madrid; Nutrition Unit, Pere Virgili Research Institute (IISPV) (J.S.-S.), University Hospital of Sant Joan de Reus, Spain; and Broad Institute of the Massachusetts Institute of Technology and Harvard University (C.B.C.), Cambridge
| | - Clary B Clish
- From the Department of Biostatistics and Epidemiology (R.B., Y.Z., K.H.S.), University of Massachusetts-Amherst; Division of Women's Health (J.H., K.M.R.) and Channing Division of Network Medicine, Department of Medicine (M.G.-F., F.H.), Brigham and Women's Hospital, Harvard Medical School; Departments of Nutrition (M.G.-F., J.L., F.H.) and Epidemiology (J.L., F.H.), Harvard T.H. Chan School of Public Health, Boston, MA; Davee Department of Neurology, Division of Stroke and Neurocritical Care (F.S.), Northwestern Feinberg School of Medicine, Chicago, IL; Departament de Bioquímica i Biotecnologia, Unitat de Nutrició (J.S.S.), Universitat Rovira i Virgili, Reus; Centro de Investigación Biomédica en Red Fisiopatología de la Obesidad y la Nutrición (CIBEROBN) (J.S.-S.), Institute of Health Carlos III, Madrid; Nutrition Unit, Pere Virgili Research Institute (IISPV) (J.S.-S.), University Hospital of Sant Joan de Reus, Spain; and Broad Institute of the Massachusetts Institute of Technology and Harvard University (C.B.C.), Cambridge
| | - Kathryn M Rexrode
- From the Department of Biostatistics and Epidemiology (R.B., Y.Z., K.H.S.), University of Massachusetts-Amherst; Division of Women's Health (J.H., K.M.R.) and Channing Division of Network Medicine, Department of Medicine (M.G.-F., F.H.), Brigham and Women's Hospital, Harvard Medical School; Departments of Nutrition (M.G.-F., J.L., F.H.) and Epidemiology (J.L., F.H.), Harvard T.H. Chan School of Public Health, Boston, MA; Davee Department of Neurology, Division of Stroke and Neurocritical Care (F.S.), Northwestern Feinberg School of Medicine, Chicago, IL; Departament de Bioquímica i Biotecnologia, Unitat de Nutrició (J.S.S.), Universitat Rovira i Virgili, Reus; Centro de Investigación Biomédica en Red Fisiopatología de la Obesidad y la Nutrición (CIBEROBN) (J.S.-S.), Institute of Health Carlos III, Madrid; Nutrition Unit, Pere Virgili Research Institute (IISPV) (J.S.-S.), University Hospital of Sant Joan de Reus, Spain; and Broad Institute of the Massachusetts Institute of Technology and Harvard University (C.B.C.), Cambridge
| |
Collapse
|
25
|
Abstract
DNA microarrays are widely used to investigate gene expression. Even though the classical analysis of microarray data is based on the study of differentially expressed genes, it is well known that genes do not act individually. Network analysis can be applied to study association patterns of the genes in a biological system. Moreover, it finds wide application in differential coexpression analysis between different systems. Network based coexpression studies have for example been used in (complex) disease gene prioritization, disease subtyping, and patient stratification.In this chapter we provide an overview of the methods and tools used to create networks from microarray data and describe multiple methods on how to analyze a single network or a group of networks. The described methods range from topological metrics, functional group identification to data integration strategies, topological pathway analysis as well as graphical models.
Collapse
Affiliation(s)
- Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- BioMediTech Institute, Tampere University, Tampere, Finland.
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland.
- Institute of Biotechnology , University of Helsinki, Helsinki, Finland.
| |
Collapse
|
26
|
Tan YT, Ou-Yang L, Jiang X, Yan H, Zhang XF. Identifying Gene Network Rewiring Based on Partial Correlation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:513-521. [PMID: 32750866 DOI: 10.1109/tcbb.2020.3002906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
It is an important task to learn how gene regulatory networks change under different conditions. Several Gaussian graphical model-based methods have been proposed to deal with this task by inferring differential networks from gene expression data. However, most existing methods define the differential networks as the difference of precision matrices, which may include false differential edges caused by the change of conditional variances. In addition, prior information about the condition-specific networks and the differential networks can be obtained from other domains. It is useful to incorporate prior information into differential network analysis. In this study, we propose a new differential network analysis method to address the above challenges. Instead of using the precision matrices, we define the differential networks as the difference of partial correlations, which can exclude the spurious differential edges due to the variants of conditional variances. Furthermore, prior information from multiple hypothesis testing is incorporated using a weighted fused penalty. Simulation studies show that our method outperforms the competing methods. We also apply our method to identify the differential network between luminal A and basal-like subtypes of breast cancers and the differential network between acute myeloid leukemia tumors and normal samples. The hub genes in the differential networks identified by our method carry out important biological functions.
Collapse
|
27
|
Zhu F, Li J, Liu J, Min W. Network-based cancer genomic data integration for pattern discovery. BMC Genom Data 2021; 22:54. [PMID: 34886811 PMCID: PMC8662848 DOI: 10.1186/s12863-021-01004-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Since genes involved in the same biological modules usually present correlated expression profiles, lots of computational methods have been proposed to identify gene functional modules based on the expression profiles data. Recently, Sparse Singular Value Decomposition (SSVD) method has been proposed to bicluster gene expression data to identify gene modules. However, this model can only handle the gene expression data where no gene interaction information is integrated. Ignoring the prior gene interaction information may produce the identified gene modules hard to be biologically interpreted. RESULTS In this paper, we develop a Sparse Network-regularized SVD (SNSVD) method that integrates a prior gene interaction network from a protein protein interaction network and gene expression data to identify underlying gene functional modules. The results on a set of simulated data show that SNSVD is more effective than the traditional SVD-based methods. The further experiment results on real cancer genomic data show that most co-expressed modules are not only significantly enriched on GO/KEGG pathways, but also correspond to dense sub-networks in the prior gene interaction network. Besides, we also use our method to identify ten differentially co-expressed miRNA-gene modules by integrating matched miRNA and mRNA expression data of breast cancer from The Cancer Genome Atlas (TCGA). Several important breast cancer related miRNA-gene modules are discovered. CONCLUSIONS All the results demonstrate that SNSVD can overcome the drawbacks of SSVD and capture more biologically relevant functional modules by incorporating a prior gene interaction network. These identified functional modules may provide a new perspective to understand the diagnostics, occurrence and progression of cancer.
Collapse
Affiliation(s)
- Fangfang Zhu
- State Key Laboratory of Nuclear Resources and Environment and School of Water Resources and Environmental Engineering, East China University of Technology, Nanchang, 330013, China
- State Key Laboratory of Nuclear Resources and Environment and School of Chemistry, Biology and Materials Science, East China University of Technology, Nanchang, 330013, China
| | - Jiang Li
- State Key Laboratory of Nuclear Resources and Environment and School of Chemistry, Biology and Materials Science, East China University of Technology, Nanchang, 330013, China.
| | - Juan Liu
- School of Computer Science, Wuhan University, Wuhan, 430072, China.
| | - Wenwen Min
- School of Mathematics and Computer Science, Jiangxi Science and Technology Normal University, Nanchang, 330038, China.
- Information School, Yunnan University, Kunming, 650091, China.
| |
Collapse
|
28
|
Liu C, Cai D, Zeng W, Huang Y. Inferring Differential Networks by Integrating Gene Expression Data With Additional Knowledge. Front Genet 2021; 12:760155. [PMID: 34858477 PMCID: PMC8632038 DOI: 10.3389/fgene.2021.760155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 10/13/2021] [Indexed: 11/23/2022] Open
Abstract
Evidences increasingly indicate the involvement of gene network rewiring in disease development and cell differentiation. With the accumulation of high-throughput gene expression data, it is now possible to infer the changes of gene networks between two different states or cell types via computational approaches. However, the distribution diversity of multi-platform gene expression data and the sparseness and high noise rate of single-cell RNA sequencing (scRNA-seq) data raise new challenges for existing differential network estimation methods. Furthermore, most existing methods are purely rely on gene expression data, and ignore the additional information provided by various existing biological knowledge. In this study, to address these challenges, we propose a general framework, named weighted joint sparse penalized D-trace model (WJSDM), to infer differential gene networks by integrating multi-platform gene expression data and multiple prior biological knowledge. Firstly, a non-paranormal graphical model is employed to tackle gene expression data with missing values. Then we propose a weighted group bridge penalty to integrate multi-platform gene expression data and various existing biological knowledge. Experiment results on synthetic data demonstrate the effectiveness of our method in inferring differential networks. We apply our method to the gene expression data of ovarian cancer and the scRNA-seq data of circulating tumor cells of prostate cancer, and infer the differential network associated with platinum resistance of ovarian cancer and anti-androgen resistance of prostate cancer. By analyzing the estimated differential networks, we find some important biological insights about the mechanisms underlying platinum resistance of ovarian cancer and anti-androgen resistance of prostate cancer.
Collapse
Affiliation(s)
- Chen Liu
- Department of Chemotherapy, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - WuCha Zeng
- Department of Chemotherapy, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Yun Huang
- Department of Geriatric Medicine, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| |
Collapse
|
29
|
Grimes T, Datta S. A novel probabilistic generator for large-scale gene association networks. PLoS One 2021; 16:e0259193. [PMID: 34767561 PMCID: PMC8589155 DOI: 10.1371/journal.pone.0259193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Accepted: 10/14/2021] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION Gene expression data provide an opportunity for reverse-engineering gene-gene associations using network inference methods. However, it is difficult to assess the performance of these methods because the true underlying network is unknown in real data. Current benchmarks address this problem by subsampling a known regulatory network to conduct simulations. But the topology of regulatory networks can vary greatly across organisms or tissues, and reference-based generators-such as GeneNetWeaver-are not designed to capture this heterogeneity. This means, for example, benchmark results from the E. coli regulatory network will not carry over to other organisms or tissues. In contrast, probabilistic generators do not require a reference network, and they have the potential to capture a rich distribution of topologies. This makes probabilistic generators an ideal approach for obtaining a robust benchmarking of network inference methods. RESULTS We propose a novel probabilistic network generator that (1) provides an alternative to address the inherent limitation of reference-based generators and (2) is able to create realistic gene association networks, and (3) captures the heterogeneity found across gold-standard networks better than existing generators used in practice. Eight organism-specific and 12 human tissue-specific gold-standard association networks are considered. Several measures of global topology are used to determine the similarity of generated networks to the gold-standards. Along with demonstrating the variability of network structure across organisms and tissues, we show that the commonly used "scale-free" model is insufficient for replicating these structures. AVAILABILITY This generator is implemented in the R package "SeqNet" and is available on CRAN (https://cran.r-project.org/web/packages/SeqNet/index.html).
Collapse
Affiliation(s)
- Tyler Grimes
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| | - Somnath Datta
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
30
|
Shutta KH, Balasubramanian R, Huang T, Jha SC, Zeleznik OA, Kroenke CH, Tinker LF, Smoller JW, Casanova R, Tworoger SS, Manson JE, Clish CB, Rexrode KM, Hankinson SE, Kubzansky LD. Plasma metabolomic profiles associated with chronic distress in women. Psychoneuroendocrinology 2021; 133:105420. [PMID: 34597898 PMCID: PMC8547060 DOI: 10.1016/j.psyneuen.2021.105420] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 09/10/2021] [Accepted: 09/12/2021] [Indexed: 11/19/2022]
Abstract
Several forms of chronic distress including anxiety and depression are associated with adverse cardiometabolic outcomes. Metabolic alterations may underlie these associations. Whether these forms of distress are associated with metabolic alterations even after accounting for comorbid conditions and other factors remains unclear. Using an agnostic approach, this study examines a broad range of metabolites in relation to chronic distress among women. For this cross-sectional study of chronic distress and 577 plasma metabolites, data are from different substudies within the Women's Health Initiative (WHI) and Nurses' Health Studies (NHSI, NHSII). Chronic distress was characterized by depressive symptoms and other depression indicators in the WHI and NHSII substudies, and by combined indicators of anxiety and depressive symptoms in the NHSI substudy. We used a two-phase discovery-validation framework, with WHI (N = 1317) and NHSII (N = 218) substudies in the discovery phase (identifying metabolites associated with distress) and NHSI (N = 558) substudy in the validation phase. A differential network analysis provided a systems-level assessment of metabolomic alterations under chronic distress. Analyses adjusted for potential confounders and mediators (demographics, comorbidities, medications, lifestyle factors). In the discovery phase, 46 metabolites were significantly associated with depression measures. In validation, six of these metabolites demonstrated significant associations with chronic distress after adjustment for potential confounders. Among women with high distress, we found lower gamma-aminobutyric acid (GABA), threonine, biliverdin, and serotonin and higher C16:0 ceramide and 3-methylxanthine. Our findings suggest chronic distress is associated with metabolomic alterations and provide specific targets for future study of biological pathways in chronic diseases.
Collapse
Affiliation(s)
- Katherine H Shutta
- Department of Biostatistics and Epidemiology, University of Massachusetts Amherst, 010 Arnold House, 715 North Pleasant Street, Amherst, MA 01003, USA.
| | - Raji Balasubramanian
- Department of Biostatistics and Epidemiology, University of Massachusetts Amherst, 010 Arnold House, 715 North Pleasant Street, Amherst, MA 01003, USA.
| | - Tianyi Huang
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.
| | - Shaili C Jha
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Oana A Zeleznik
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.
| | - Candyce H Kroenke
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA.
| | - Lesley F Tinker
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | - Jordan W Smoller
- Department of Psychiatry and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | | | - Shelley S Tworoger
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA; Department of Cancer Epidemiology, Moffit Cancer Center, Tampa, FL, USA.
| | - JoAnn E Manson
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA; Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.
| | - Clary B Clish
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA.
| | - Kathryn M Rexrode
- Harvard Medical School, Boston, MA, USA; Division of Women's Health, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.
| | - Susan E Hankinson
- Department of Biostatistics and Epidemiology, University of Massachusetts Amherst, 010 Arnold House, 715 North Pleasant Street, Amherst, MA 01003, USA.
| | - Laura D Kubzansky
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
31
|
Leng J, Wu LY. Importance-Penalized Joint Graphical Lasso (IPJGL): differential network inference via GGMs. Bioinformatics 2021; 38:770-777. [PMID: 34718410 PMCID: PMC8756181 DOI: 10.1093/bioinformatics/btab751] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 10/03/2021] [Accepted: 10/27/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Differential network inference is a fundamental and challenging problem to reveal gene interactions and regulation relationships under different conditions. Many algorithms have been developed for this problem; however, they do not consider the differences between the importance of genes, which may not fit the real-world situation. Different genes have different mutation probabilities, and the vital genes associated with basic life activities have less fault tolerance to mutation. Equally treating all genes may bias the results of differential network inference. Thus, it is necessary to consider the importance of genes in the models of differential network inference. RESULTS Based on the Gaussian graphical model with adaptive gene importance regularization, we develop a novel Importance-Penalized Joint Graphical Lasso method (IPJGL) for differential network inference. The presented method is validated by the simulation experiments as well as the real datasets. Furthermore, to precisely evaluate the results of differential network inference, we propose a new metric named APC2 for the differential levels of gene pairs. We apply IPJGL to analyze the TCGA colorectal and breast cancer datasets and find some candidate cancer genes with significant survival analysis results, including SOST for colorectal cancer and RBBP8 for breast cancer. We also conduct further analysis based on the interactions in the Reactome database and confirm the utility of our method. AVAILABILITY AND IMPLEMENTATION R source code of Importance-Penalized Joint Graphical Lasso is freely available at https://github.com/Wu-Lab/IPJGL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiacheng Leng
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | | |
Collapse
|
32
|
Kosvyra A, Ntzioni E, Chouvarda I. Network analysis with biological data of cancer patients: A scoping review. J Biomed Inform 2021; 120:103873. [PMID: 34298154 DOI: 10.1016/j.jbi.2021.103873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 06/30/2021] [Accepted: 07/18/2021] [Indexed: 12/25/2022]
Abstract
BACKGROUND & OBJECTIVE Network Analysis (NA) is a mathematical method that allows exploring relations between units and representing them as a graph. Although NA was initially related to social sciences, the past two decades was introduced in Bioinformatics. The recent growth of the networks' use in biological data analysis reveals the need to further investigate this area. In this work, we attempt to identify the use of NA with biological data, and specifically: (a) what types of data are used and whether they are integrated or not, (b) what is the purpose of this analysis, predictive or descriptive, and (c) the outcome of such analyses, specifically in cancer diseases. METHODS & MATERIALS The literature review was conducted on two databases, PubMed & IEEE, and was restricted to journal articles of the last decade (January 2010 - December 2019). At a first level, all articles were screened by title and abstract, and at a second level the screening was conducted by reading the full text article, following the predefined inclusion & exclusion criteria leading to 131 articles of interest. A table was created with the information of interest and was used for the classification of the articles. The articles were initially classified to analysis studies and studies that propose a new algorithm or methodology. Each one of these categories was further screened by the following clustering criteria: (a) data used, (b) study purpose, (c) study outcome. Specifically for the studies proposing a new algorithm, the novelty presented in each one was detected. RESULTS & Conclusions: In the past five years researchers are focusing on creating new algorithms and methodologies to enhance this field. The articles' classification revealed that only 25% of the analyses are integrating multi-omics data, although 50% of the new algorithms developed follow this integrative direction. Moreover, only 20% of the analyses and 10% of the newly developed methodologies have a predictive purpose. Regarding the result of the works reviewed, 75% of the studies focus on identifying, prognostic or not, gene signatures. Concluding, this review revealed the need for deploying predictive and multi-omics integrative algorithms and methodologies that can be used to enhance cancer diagnosis, prognosis and treatment.
Collapse
Affiliation(s)
- A Kosvyra
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | - E Ntzioni
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - I Chouvarda
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
33
|
Tu JJ, Ou-Yang L, Zhu Y, Yan H, Qin H, Zhang XF. Differential network analysis by simultaneously considering changes in gene interactions and gene expression. Bioinformatics 2021; 37:4414-4423. [PMID: 34245246 DOI: 10.1093/bioinformatics/btab502] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 06/13/2021] [Accepted: 07/05/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Differential network analysis is an important tool to investigate the rewiring of gene interactions under different conditions. Several computational methods have been developed to estimate differential networks from gene expression data, but most of them do not consider that gene network rewiring may be driven by the differential expression of individual genes. New differential network analysis methods that simultaneously take account of the changes in gene interactions and changes in expression levels are needed. RESULTS In this paper, we propose a differential network analysis method that considers the differential expression of individual genes when identifying differential edges. First, two hypothesis test statistics are used to quantify changes in partial correlations between gene pairs and changes in expression levels for individual genes. Then, an optimization framework is proposed to combine the two test statistics so that the resulting differential network has a hierarchical property, where a differential edge can be considered only if at least one of the two involved genes is differentially expressed. Simulation results indicate that our method outperforms current state-of-the-art methods. We apply our method to identify the differential networks between the luminal A and basal-like subtypes of breast cancer and those between acute myeloid leukemia and normal samples. Hub nodes in the differential networks estimated by our method, including both differentially and non-differentially expressed genes, have important biological functions. AVAILABILITY The source code is available at https://github.com/Zhangxf-ccnu/chNet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jia-Juan Tu
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, 430074, China.,Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, China University of Geosciences, Wuhan, 430074, China
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - Hong Qin
- Department of Statistics, Zhongnan University of Economics and Law, Wuhan, 430073, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China
| |
Collapse
|
34
|
Grimes T, Datta S. SeqNet: An R Package for Generating Gene-Gene Networks and Simulating RNA-Seq Data. J Stat Softw 2021; 98:10.18637/jss.v098.i12. [PMID: 34321962 PMCID: PMC8315007 DOI: 10.18637/jss.v098.i12] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Gene expression data provide an abundant resource for inferring connections in gene regulatory networks. While methodologies developed for this task have shown success, a challenge remains in comparing the performance among methods. Gold-standard datasets are scarce and limited in use. And while tools for simulating expression data are available, they are not designed to resemble the data obtained from RNA-seq experiments. SeqNet is an R package that provides tools for generating a rich variety of gene network structures and simulating RNA-seq data from them. This produces in silico RNA-seq data for benchmarking and assessing gene network inference methods. The package is available on CRAN and on GitHub at https://github.com/tgrimes/SeqNet.
Collapse
Affiliation(s)
- Tyler Grimes
- Univeristy of Florida, Department of Biostatistics
| | | |
Collapse
|
35
|
Xu T, Ou-Yang L, Yan H, Zhang XF. Time-Varying Differential Network Analysis for Revealing Network Rewiring over Cancer Progression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1632-1642. [PMID: 31647444 DOI: 10.1109/tcbb.2019.2949039] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
To reveal how gene regulatory networks change over cancer development, multiple time-varying differential networks between adjacent cancer stages should be estimated simultaneously. Since the network rewiring may be driven by the perturbation of certain individual genes, there may be some hub nodes shared by these differential networks. Although several methods have been developed to estimate differential networks from gene expression data, most of them are designed for estimating a single differential network, which neglect the similarities between different differential networks. In this article, we propose a new Gaussian graphical model-based method to jointly estimate multiple time-varying differential networks for identifying network rewiring over cancer development. A D-trace loss is used to determine the differential networks. A tree-structured group Lasso penalty is designed to identify the common hub nodes shared by different differential networks and the specific hub nodes unique to individual differential networks. Simulation experiment results demonstrate that our method outperforms other state-of-the-art techniques in most cases. We also apply our method to The Cancer Genome Atlas data to explore gene network rewiring over different breast cancer stages. Hub nodes in the estimated differential networks rediscover well known genes associated with the development and progression of breast cancer.
Collapse
|
36
|
Arbet J, Zhuang Y, Litkowski E, Saba L, Kechris K. Comparing Statistical Tests for Differential Network Analysis of Gene Modules. Front Genet 2021; 12:630215. [PMID: 34093641 PMCID: PMC8170128 DOI: 10.3389/fgene.2021.630215] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 04/19/2021] [Indexed: 11/13/2022] Open
Abstract
Genes often work together to perform complex biological processes, and "networks" provide a versatile framework for representing the interactions between multiple genes. Differential network analysis (DiNA) quantifies how this network structure differs between two or more groups/phenotypes (e.g., disease subjects and healthy controls), with the goal of determining whether differences in network structure can help explain differences between phenotypes. In this paper, we focus on gene co-expression networks, although in principle, the methods studied can be used for DiNA for other types of features (e.g., metabolome, epigenome, microbiome, proteome, etc.). Three common applications of DiNA involve (1) testing whether the connections to a single gene differ between groups, (2) testing whether the connection between a pair of genes differs between groups, or (3) testing whether the connections within a "module" (a subset of 3 or more genes) differs between groups. This article focuses on the latter, as there is a lack of studies comparing statistical methods for identifying differentially co-expressed modules (DCMs). Through extensive simulations, we compare several previously proposed test statistics and a new p-norm difference test (PND). We demonstrate that the true positive rate of the proposed PND test is competitive with and often higher than the other methods, while controlling the false positive rate. The R package discoMod (differentially co-expressed modules) implements the proposed method and provides a full pipeline for identifying DCMs: clustering tools to derive gene modules, tests to identify DCMs, and methods for visualizing the results.
Collapse
Affiliation(s)
- Jaron Arbet
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Yaxu Zhuang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Elizabeth Litkowski
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Laura Saba
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora CO, United States
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
37
|
Sudhakar P, Verstockt B, Cremer J, Verstockt S, Sabino J, Ferrante M, Vermeire S. Understanding the Molecular Drivers of Disease Heterogeneity in Crohn's Disease Using Multi-omic Data Integration and Network Analysis. Inflamm Bowel Dis 2021; 27:870-886. [PMID: 33313682 PMCID: PMC8128416 DOI: 10.1093/ibd/izaa281] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Indexed: 12/12/2022]
Abstract
Crohn's disease (CD), a form of inflammatory bowel disease (IBD), is characterized by heterogeneity along multiple clinical axes, which in turn impacts disease progression and treatment modalities. Using advanced data integration approaches and systems biology tools, we studied the contribution of CD susceptibility variants and gene expression in distinct peripheral immune cell subsets (CD14+ monocytes and CD4+ T cells) to relevant clinical traits. Our analyses revealed that most clinical traits capturing CD heterogeneity could be associated with CD14+ and CD4+ gene expression rather than disease susceptibility variants. By disentangling the sources of variation, we identified molecular features that could potentially be driving the heterogeneity of various clinical traits of CD patients. Further downstream analyses identified contextual hub proteins such as genes encoding barrier functions, antimicrobial peptides, chemokines, and their receptors, which are either targeted by drugs used in CD or other inflammatory diseases or are relevant to the biological functions implicated in disease pathology. These hubs could be used as cell type-specific targets to treat specific subtypes of CD patients in a more individualized approach based on the underlying biology driving their disease subtypes. Our study highlights the importance of data integration and systems approaches to investigate complex and heterogeneous diseases such as IBD.
Collapse
Affiliation(s)
- Padhmanand Sudhakar
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID)
| | - Bram Verstockt
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID)
- University Hospitals Leuven, Department of Gastroenterology and Hepatology
| | - Jonathan Cremer
- Department of Microbiology and Immunology, Laboratory of Clinical Immunology, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Sare Verstockt
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID)
| | - João Sabino
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID)
- University Hospitals Leuven, Department of Gastroenterology and Hepatology
| | - Marc Ferrante
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID)
- University Hospitals Leuven, Department of Gastroenterology and Hepatology
| | - Séverine Vermeire
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID)
- University Hospitals Leuven, Department of Gastroenterology and Hepatology
| |
Collapse
|
38
|
Ou-Yang L, Cai D, Zhang XF, Yan H. WDNE: an integrative graphical model for inferring differential networks from multi-platform gene expression data with missing values. Brief Bioinform 2021; 22:6272792. [PMID: 33975339 DOI: 10.1093/bib/bbab086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 02/14/2021] [Accepted: 02/23/2021] [Indexed: 11/14/2022] Open
Abstract
The mechanisms controlling biological process, such as the development of disease or cell differentiation, can be investigated by examining changes in the networks of gene dependencies between states in the process. High-throughput experimental methods, like microarray and RNA sequencing, have been widely used to gather gene expression data, which paves the way to infer gene dependencies based on computational methods. However, most differential network analysis methods are designed to deal with fully observed data, but missing values, such as the dropout events in single-cell RNA-sequencing data, are frequent. New methods are needed to take account of these missing values. Moreover, since the changes of gene dependencies may be driven by certain perturbed genes, considering the changes in gene expression levels may promote the identification of gene network rewiring. In this study, a novel weighted differential network estimation (WDNE) model is proposed to handle multi-platform gene expression data with missing values and take account of changes in gene expression levels. Simulation studies demonstrate that WDNE outperforms state-of-the-art differential network estimation methods. When applied WDNE to infer differential gene networks associated with drug resistance in ovarian tumors, cell differentiation and breast tumor heterogeneity, the hub genes in the estimated differential gene networks can provide important insights into the underlying mechanisms. Furthermore, a Matlab toolbox, differential network analysis toolbox, was developed to implement the WDNE model and visualize the estimated differential networks.
Collapse
Affiliation(s)
- Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, 999077, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, 999077, China
| |
Collapse
|
39
|
Sharma R, Kumar S, Song M. Fundamental gene network rewiring at the second order within and across mammalian systems. Bioinformatics 2021; 37:3293-3301. [PMID: 33950233 DOI: 10.1093/bioinformatics/btab240] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 02/24/2021] [Accepted: 04/09/2021] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Genetic or epigenetic events can rewire molecular networks to induce extraordinary phenotypical divergences. Among the many network rewiring approaches, no model-free statistical methods can differentiate gene-gene pattern changes not attributed to marginal changes. This may obscure fundamental rewiring from superficial changes. RESULTS Here we introduce a model-free Sharma-Song test to determine if patterns differ in the second order, meaning that the deviation of the joint distribution from the product of marginal distributions is unequal across conditions. We prove an asymptotic chi-squared null distribution for the test statistic. Simulation studies demonstrate its advantage over alternative methods in detecting second-order differential patterns. Applying the test on three independent mammalian developmental transcriptome datasets, we report a lower frequency of co-expression network rewiring between human and mouse for the same tissue group than the frequency of rewiring between tissue groups within the same species. We also find secondorder differential patterns between microRNA promoters and genes contrasting cerebellum and liver development in mice. These patterns are enriched in the spliceosome pathway regulating tissue specificity. Complementary to previous mammalian comparative studies mostly driven by first-order effects, our findings contribute an understanding of system-wide second-order gene network rewiring within and across mammalian systems. Second-order differential patterns constitute evidence for fundamentally rewired biological circuitry due to evolution, environment, or disease. AVAILABILITY The generic Sharma-Song test is available from the R package 'DiffXTables' at https://cran.rproject.org/package=DiffXTables. Other code and data are described in Methods. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruby Sharma
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
| | - Sajal Kumar
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
| | - Mingzhou Song
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA.,Molecular Biology and Interdisciplinary Life Science Graduate Program New Mexico State University, Las Cruces, NM 88003, USA
| |
Collapse
|
40
|
Lopes MB, Martins EP, Vinga S, Costa BM. The Role of Network Science in Glioblastoma. Cancers (Basel) 2021; 13:1045. [PMID: 33801334 PMCID: PMC7958335 DOI: 10.3390/cancers13051045] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 02/19/2021] [Accepted: 02/22/2021] [Indexed: 12/13/2022] Open
Abstract
Network science has long been recognized as a well-established discipline across many biological domains. In the particular case of cancer genomics, network discovery is challenged by the multitude of available high-dimensional heterogeneous views of data. Glioblastoma (GBM) is an example of such a complex and heterogeneous disease that can be tackled by network science. Identifying the architecture of molecular GBM networks is essential to understanding the information flow and better informing drug development and pre-clinical studies. Here, we review network-based strategies that have been used in the study of GBM, along with the available software implementations for reproducibility and further testing on newly coming datasets. Promising results have been obtained from both bulk and single-cell GBM data, placing network discovery at the forefront of developing a molecularly-informed-based personalized medicine.
Collapse
Affiliation(s)
- Marta B. Lopes
- Center for Mathematics and Applications (CMA), FCT, UNL, 2829-516 Caparica, Portugal
- NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL, 2829-516 Caparica, Portugal
| | - Eduarda P. Martins
- Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal; (E.P.M.); (B.M.C.)
- ICVS/3B’s—PT Government Associate Laboratory, 4710-057/4805-017 Braga/Guimarães, Portugal
| | - Susana Vinga
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, 1000-029 Lisbon, Portugal;
- IDMEC, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal
| | - Bruno M. Costa
- Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal; (E.P.M.); (B.M.C.)
- ICVS/3B’s—PT Government Associate Laboratory, 4710-057/4805-017 Braga/Guimarães, Portugal
| |
Collapse
|
41
|
Oh VKS, Li RW. Temporal Dynamic Methods for Bulk RNA-Seq Time Series Data. Genes (Basel) 2021; 12:352. [PMID: 33673721 PMCID: PMC7997275 DOI: 10.3390/genes12030352] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 02/19/2021] [Accepted: 02/22/2021] [Indexed: 02/06/2023] Open
Abstract
Dynamic studies in time course experimental designs and clinical approaches have been widely used by the biomedical community. These applications are particularly relevant in stimuli-response models under environmental conditions, characterization of gradient biological processes in developmental biology, identification of therapeutic effects in clinical trials, disease progressive models, cell-cycle, and circadian periodicity. Despite their feasibility and popularity, sophisticated dynamic methods that are well validated in large-scale comparative studies, in terms of statistical and computational rigor, are less benchmarked, comparing to their static counterparts. To date, a number of novel methods in bulk RNA-Seq data have been developed for the various time-dependent stimuli, circadian rhythms, cell-lineage in differentiation, and disease progression. Here, we comprehensively review a key set of representative dynamic strategies and discuss current issues associated with the detection of dynamically changing genes. We also provide recommendations for future directions for studying non-periodical, periodical time course data, and meta-dynamic datasets.
Collapse
Affiliation(s)
- Vera-Khlara S. Oh
- Animal Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705, USA;
- Department of Computer Science and Statistics, College of Natural Sciences, Jeju National University, Jeju City 63243, Korea
| | - Robert W. Li
- Animal Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705, USA;
| |
Collapse
|
42
|
Zhang XF, Ou-Yang L, Yan T, Hu XT, Yan H. A Joint Graphical Model for Inferring Gene Networks Across Multiple Subpopulations and Data Types. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1043-1055. [PMID: 31794418 DOI: 10.1109/tcyb.2019.2952711] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Reconstructing gene networks from gene expression data is a long-standing challenge. In most applications, the observations can be divided into several distinct but related subpopulations and the gene expression measurements can be collected from multiple data types. Most existing methods are designed to estimate a single gene network from a single dataset. These methods may be suboptimal since they do not exploit the similarities and differences among different subpopulations and data types. In this article, we propose a joint graphical model to estimate the multiple gene networks simultaneously. Our model decomposes each subpopulation-specific gene network as a sum of common and unique components and imposes a group lasso penalty on gene networks corresponding to different data types. The gene network variations across subpopulations can be learned automatically by the decompositions of networks, and the similarities and differences among data types can be captured by the group lasso penalty. The simulation studies demonstrate that our method outperforms the state-of-the-art methods. We also apply our method to the cancer genome atlas breast cancer datasets to reconstruct subtype-specific gene networks. Hub nodes in the estimated subnetworks unique to individual cancer subtypes rediscover well-known genes associated with breast cancer subtypes and provide interesting predictions.
Collapse
|
43
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 PMCID: PMC8296984 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
44
|
Savino A, Provero P, Poli V. Differential Co-Expression Analyses Allow the Identification of Critical Signalling Pathways Altered during Tumour Transformation and Progression. Int J Mol Sci 2020; 21:E9461. [PMID: 33322692 PMCID: PMC7764314 DOI: 10.3390/ijms21249461] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 12/02/2020] [Accepted: 12/09/2020] [Indexed: 02/02/2023] Open
Abstract
Biological systems respond to perturbations through the rewiring of molecular interactions, organised in gene regulatory networks (GRNs). Among these, the increasingly high availability of transcriptomic data makes gene co-expression networks the most exploited ones. Differential co-expression networks are useful tools to identify changes in response to an external perturbation, such as mutations predisposing to cancer development, and leading to changes in the activity of gene expression regulators or signalling. They can help explain the robustness of cancer cells to perturbations and identify promising candidates for targeted therapy, moreover providing higher specificity with respect to standard co-expression methods. Here, we comprehensively review the literature about the methods developed to assess differential co-expression and their applications to cancer biology. Via the comparison of normal and diseased conditions and of different tumour stages, studies based on these methods led to the definition of pathways involved in gene network reorganisation upon oncogenes' mutations and tumour progression, often converging on immune system signalling. A relevant implementation still lagging behind is the integration of different data types, which would greatly improve network interpretability. Most importantly, performance and predictivity evaluation of the large variety of mathematical models proposed would urgently require experimental validations and systematic comparisons. We believe that future work on differential gene co-expression networks, complemented with additional omics data and experimentally tested, will considerably improve our insights into the biology of tumours.
Collapse
Affiliation(s)
- Aurora Savino
- Molecular Biotechnology Center, Department of Molecular Biotechnology and Health Sciences, University of Turin, Via Nizza 52, 10126 Turin, Italy
| | - Paolo Provero
- Department of Neurosciences “Rita Levi Montalcini”, University of Turin, Corso Massimo D’Ázeglio 52, 10126 Turin, Italy;
- Center for Omics Sciences, Ospedale San Raffaele IRCCS, Via Olgettina 60, 20132 Milan, Italy
| | - Valeria Poli
- Molecular Biotechnology Center, Department of Molecular Biotechnology and Health Sciences, University of Turin, Via Nizza 52, 10126 Turin, Italy
| |
Collapse
|
45
|
Zhang Q, Sun L, Zhang Q, Zhang W, Tian W, Liu M, Wang Y. Construction of a disease-specific lncRNA-miRNA-mRNA regulatory network reveals potential regulatory axes and prognostic biomarkers for hepatocellular carcinoma. Cancer Med 2020; 9:9219-9235. [PMID: 33232580 PMCID: PMC7774738 DOI: 10.1002/cam4.3526] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 08/14/2020] [Accepted: 09/21/2020] [Indexed: 01/04/2023] Open
Abstract
Hepatocellular carcinoma (HCC) is a heterogeneous malignancy with a high incidence and poor prognosis. Exploration of the underlying mechanisms and effective prognostic indicators is conducive to clinical management and optimization of treatment. The RNA‐seq and clinical phenotype data of HCC were retrieved from The Cancer Genome Atlas (TCGA), and differential expression analysis was performed. Then, a differential lncRNA‐miRNA‐mRNA regulatory network was constructed, and the key genes were further identified and validated. By integrating this network with the online tool‐based ceRNA network, an HCC‐specific ceRNA network was obtained, and lncRNA‐miRNA‐mRNA regulatory axes were extracted. RNAs associated with prognosis were further obtained, and multivariate Cox regression models were established to identify the prognostic signature and nomogram. As a result, 198 DElncRNAs, 120 DEmiRNAs, and 2827 DEmRNAs were identified, and 30 key genes identified from the differential network were enriched in four cancer‐related pathways. Four HCC‐specific lncRNA‐miRNA‐mRNA regulatory axes were extracted, and SNHG11, CRNDE, MYLK‐AS1, E2F3, and CHEK1 were found to be related with HCC prognosis. Multivariate Cox regression analysis identified a prognostic signature, comprised of CRNDE, MYLK‐AS1, and CHEK1, for overall survival (OS) of HCC. A nomogram comprising the prognostic signature and pathological stage was established and showed some net clinical benefits. The AUC of the prognostic signature and nomogram for 1‐year, 3‐year, and 5‐year survival was 0.777 (0.657‐0.865), 0.722 (0.640‐0.848), and 0.630 (0.528‐0.823), and 0.751 (0.664‐0.870), 0.773 (0.707‐0.849), and 0.734 (0.638‐0.845), respectively. These results provided clues for the study of potential biomarkers and therapeutic targets for HCC. In addition, the obtained 30 key genes and 4 regulatory axes might also help elucidate the underlying mechanism of HCC.
Collapse
Affiliation(s)
- Qi Zhang
- Department of Biostatistics, Harbin Medical University, Harbin, Heilongjiang, China
| | - Lin Sun
- Department of Biostatistics, Harbin Medical University, Harbin, Heilongjiang, China
| | - Qiuju Zhang
- Department of Biostatistics, Harbin Medical University, Harbin, Heilongjiang, China
| | - Wei Zhang
- Department of Biostatistics, Harbin Medical University, Harbin, Heilongjiang, China
| | - Wei Tian
- Department of Biostatistics, Harbin Medical University, Harbin, Heilongjiang, China
| | - Meina Liu
- Department of Biostatistics, Harbin Medical University, Harbin, Heilongjiang, China
| | - Yupeng Wang
- Department of Biostatistics, Harbin Medical University, Harbin, Heilongjiang, China
| |
Collapse
|
46
|
Ou-Yang L, Zhang XF, Hu X, Yan H. Differential Network Analysis via Weighted Fused Conditional Gaussian Graphical Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2162-2169. [PMID: 31247559 DOI: 10.1109/tcbb.2019.2924418] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The development and prognosis of complex diseases usually involves changes in regulatory relationships among biomolecules. Understanding how the regulatory relationships change with genetic alterations can help to reveal the underlying biological mechanisms for complex diseases. Although several models have been proposed to estimate the differential network between two different states, they are not suitable to deal with situations where the molecules of interest are affected by other covariates. Nor can they make use of prior information that provides insights about the structures of biomolecular networks. In this study, we introduce a novel weighted fused conditional Gaussian graphical model to jointly estimate two state-specific biomolecular regulatory networks and their difference between two different states. Unlike previous differential network estimation methods, our model can take into account the related covariates and the prior network information when inferring differential networks. The effectiveness of our proposed model is first evaluated based on simulation studies. Experiment results demonstrate that our model outperforms other state-of-the-art differential networks estimation models in all cases. We then apply our model to identify the differential gene network between two subtypes of glioblastoma based on gene expression and miRNA expression data. Our model is able to discover known mechanisms of glioblastoma and provide interesting predictions.
Collapse
|
47
|
Yu Y, Zhang LH, Zhang S. Simultaneous clustering of multiview biomedical data using manifold optimization. Bioinformatics 2020; 35:4029-4037. [PMID: 30918942 DOI: 10.1093/bioinformatics/btz217] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Revised: 12/26/2018] [Accepted: 03/26/2019] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION Multiview clustering has attracted much attention in recent years. Several models and algorithms have been proposed for finding the clusters. However, these methods are developed either to find the consistent/common clusters across different views, or to identify the differential clusters among different views. In reality, both consistent and differential clusters may exist in multiview datasets. Thus, development of simultaneous clustering methods such that both the consistent and the differential clusters can be identified is of great importance. RESULTS In this paper, we proposed one method for simultaneous clustering of multiview data based on manifold optimization. The binary optimization model for finding the clusters is relaxed to a real value optimization problem on the Stiefel manifold, which is solved by the line-search algorithm on manifold. We applied the proposed method to both simulation data and four real datasets from TCGA. Both studies show that when the underlying clusters are consistent, our method performs competitive to the state-of-the-art algorithms. When there are differential clusters, our method performs much better. In the real data study, we performed experiments on cancer stratification and differential cluster (module) identification across multiple cancer subtypes. For the patients of different subtypes, both consistent clusters and differential clusters are identified at the same time. The proposed method identifies more clusters that are enriched by gene ontology and KEGG pathways. The differential clusters could be used to explain the different mechanisms for the cancer development in the patients of different subtypes. AVAILABILITY AND IMPLEMENTATION Codes can be downloaded from: http://homepage.fudan.edu.cn/sqzhang/files/2018/12/MVCMOcode.zip. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yun Yu
- School of Mathematical Sciences, Fudan University, Shanghai, China
| | - Lei-Hong Zhang
- School of Mathematical Sciences, Soochow University, Suzhou, China
| | - Shuqin Zhang
- School of Mathematical Sciences, Fudan University, Shanghai, China.,Center for Computational Systems Biology, Fudan University, Shanghai, China.,Shanghai Key Laboratory for Contemporary Applied Mathematics, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence of Ministry of Education, Fudan University, Shanghai, China
| |
Collapse
|
48
|
Chowdhury HA, Bhattacharyya DK, Kalita JK. (Differential) Co-Expression Analysis of Gene Expression: A Survey of Best Practices. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1154-1173. [PMID: 30668502 DOI: 10.1109/tcbb.2019.2893170] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Analysis of gene expression data is widely used in transcriptomic studies to understand functions of molecules inside a cell and interactions among molecules. Differential co-expression analysis studies diseases and phenotypic variations by finding modules of genes whose co-expression patterns vary across conditions. We review the best practices in gene expression data analysis in terms of analysis of (differential) co-expression, co-expression network, differential networking, and differential connectivity considering both microarray and RNA-seq data along with comparisons. We highlight hurdles in RNA-seq data analysis using methods developed for microarrays. We include discussion of necessary tools for gene expression analysis throughout the paper. In addition, we shed light on scRNA-seq data analysis by including preprocessing and scRNA-seq in co-expression analysis along with useful tools specific to scRNA-seq. To get insights, biological interpretation and functional profiling is included. Finally, we provide guidelines for the analyst, along with research issues and challenges which should be addressed.
Collapse
|
49
|
Pan Y, Mai Q. Efficient computation for differential network analysis with applications to quadratic discriminant analysis. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2019.106884] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
50
|
Yuan R, Ou-Yang L, Hu X, Zhang XF. Identifying Gene Network Rewiring Using Robust Differential Graphical Model with Multivariate t-Distribution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:712-718. [PMID: 30802872 DOI: 10.1109/tcbb.2019.2901473] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying gene network rewiring under different biological conditions is important for understanding the mechanisms underlying complex diseases. Gaussian graphical models, which assume the data follow the multivariate normal distribution, are widely used to identify gene network rewiring. However, the normality assume often fails in reality since the data are contaminated by extreme outliers in general. In this study, we propose a new robust differential graphical model to identify gene network rewiring between two conditions based on the multivariate t-distribution. The multivariate t-distribution is more robust to outliers than the normal distribution since it has heavy tails and allows values far from the mean. A fused lasso penalty is used to borrow information across conditions to improve the results. We develop an expectation maximization algorithm to solve the optimization model. Experiment results on simulated data show that our method outperforms the state-of-the-art methods. Our method is also applied to identify gene network rewiring between luminal A and basal-like subtypes of breast cancer, and gene network rewiring between the proneural and mesenchymal subtypes of glioblastoma. Several key genes which drive gene network rewiring are discovered.
Collapse
|