1
|
Khayer N, Jalessi M, Farhadi M, Azad Z. S100a9 might act as a modulator of the Toll-like receptor 4 transduction pathway in chronic rhinosinusitis with nasal polyps. Sci Rep 2024; 14:9722. [PMID: 38678138 PMCID: PMC11055867 DOI: 10.1038/s41598-024-60205-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 04/19/2024] [Indexed: 04/29/2024] Open
Abstract
Chronic rhinosinusitis with nasal polyp (CRSwNP) is a highly prevalent disorder characterized by persistent nasal and sinus mucosa inflammation. Despite significant morbidity and decreased quality of life, there are limited effective treatment options for such a disease. Therefore, identifying causal genes and dysregulated pathways paves the way for novel therapeutic interventions. In the current study, a three-way interaction approach was used to detect dynamic co-expression interactions involved in CRSwNP. In this approach, the internal evolution of the co-expression relation between a pair of genes (X, Y) was captured under a change in the expression profile of a third gene (Z), named the switch gene. Subsequently, the biological relevancy of the statistically significant triplets was confirmed using both gene set enrichment analysis and gene regulatory network reconstruction. Finally, the importance of identified switch genes was confirmed using a random forest model. The results suggested four dysregulated pathways in CRSwNP, including "positive regulation of intracellular signal transduction", "arachidonic acid metabolic process", "spermatogenesis" and "negative regulation of cellular protein metabolic process". Additionally, the S100a9 as a switch gene together with the gene pair {Cd14, Tpd52l1} form a biologically relevant triplet. More specifically, we suggested that S100a9 might act as a potential upstream modulator in toll-like receptor 4 transduction pathway in the major CRSwNP pathologies.
Collapse
Affiliation(s)
- Nasibeh Khayer
- Skull Base Research Center, The Five Senses Health Institute, School of Medicine, Iran University of Medical Sciences, Tehran, Iran.
| | - Maryam Jalessi
- Skull Base Research Center, The Five Senses Health Institute, School of Medicine, Iran University of Medical Sciences, Tehran, Iran.
- ENT and Head and Neck Research Center and Department, The Five Senses Health Institute, Rasoul Akram Hospital, School of Medicine, Iran University of Medical Sciences, Tehran, Iran.
| | - Mohammad Farhadi
- ENT and Head and Neck Research Center and Department, The Five Senses Health Institute, Rasoul Akram Hospital, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Zahra Azad
- Skull Base Research Center, The Five Senses Health Institute, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
2
|
Zhang W, Ma Z, Wang L, Fan D, Ho YY. Genome-wide search algorithms for identifying dynamic gene co-expression via Bayesian variable selection. Stat Med 2023; 42:5616-5629. [PMID: 37806971 DOI: 10.1002/sim.9928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 08/08/2023] [Accepted: 09/19/2023] [Indexed: 10/10/2023]
Abstract
A wealth of gene expression data generated by high-throughput techniques provides exciting opportunities for studying gene-gene interactions systematically. Gene-gene interactions in a biological system are tightly regulated and are often highly dynamic. The interactions can change flexibly under various internal cellular signals or external stimuli. Previous studies have developed statistical methods to examine these dynamic changes in gene-gene interactions. However, due to the massive number of possible gene combinations that need to be considered in a typical genomic dataset, intensive computation is a common challenge for exploring gene-gene interactions. On the other hand, oftentimes only a small proportion of gene combinations exhibit dynamic co-expression changes. To solve this problem, we propose Bayesian variable selection approaches based on spike-and-slab priors. The proposed algorithms reduce the computational intensity by focusing on identifying subsets of promising gene combinations in the search space. We also adopt a Bayesian multiple hypothesis testing procedure to identify strong dynamic gene co-expression changes. Simulation studies are performed to compare the proposed approaches with existing exhaustive search heuristics. We demonstrate the implementation of our proposed approach to study the association between gene co-expression patterns and overall survival using the RNA-sequencing dataset from The Cancer Genome Atlas breast cancer BRCA-US project.
Collapse
Affiliation(s)
- Wenda Zhang
- Walmart Global Tech, Sunnyvale, California, USA
| | - Zichen Ma
- Department of Mathematics, Colgate University, Hamilton, New York, USA
| | - Lianming Wang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| | - Daping Fan
- Department of Cell Biology and Anatomy, University of South Carolina, Columbia, South Carolina, USA
| | - Yen-Yi Ho
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| |
Collapse
|
3
|
Tu D, Mahony B, Moore TM, Bertolero MA, Alexander-Bloch AF, Gur R, Bassett DS, Satterthwaite TD, Raznahan A, Shinohara RT. CoCoA: conditional correlation models with association size. Biostatistics 2023; 25:154-170. [PMID: 35939558 PMCID: PMC10724258 DOI: 10.1093/biostatistics/kxac032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 07/14/2022] [Accepted: 07/18/2022] [Indexed: 11/13/2022] Open
Abstract
Many scientific questions can be formulated as hypotheses about conditional correlations. For instance, in tests of cognitive and physical performance, the trade-off between speed and accuracy motivates study of the two variables together. A natural question is whether speed-accuracy coupling depends on other variables, such as sustained attention. Classical regression techniques, which posit models in terms of covariates and outcomes, are insufficient to investigate the effect of a third variable on the symmetric relationship between speed and accuracy. In response, we propose a conditional correlation model with association size, a likelihood-based statistical framework to estimate the conditional correlation between speed and accuracy as a function of additional variables. We propose novel measures of the association size, which are analogous to effect sizes on the correlation scale while adjusting for confound variables. In simulation studies, we compare likelihood-based estimators of conditional correlation to semiparametric estimators adapted from genomic studies and find that the former achieves lower bias and variance under both ideal settings and model assumption misspecification. Using neurocognitive data from the Philadelphia Neurodevelopmental Cohort, we demonstrate that greater sustained attention is associated with stronger speed-accuracy coupling in a complex reasoning task while controlling for age. By highlighting conditional correlations as the outcome of interest, our model provides complementary insights to traditional regression modeling and partitioned correlation analyses.
Collapse
Affiliation(s)
- Danni Tu
- The Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, 19104, USA
| | - Bridget Mahony
- Section on Developmental Neurogenomics, National Institutes of Mental Health, 10 Center Drive, Bethesda, MD, 20892, USA
| | - Tyler M Moore
- Department of Psychiatry, Perelman School of Medicine, 3400 Spruce Street, Philadelphia, PA, 19104, USA
| | - Maxwell A Bertolero
- Department of Psychiatry, Perelman School of Medicine, Philadelphia, PA, USA and Penn Lifespan Informatics and Neuroimaging Center, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
| | | | - Ruben Gur
- Department of Psychiatry, Perelman School of Medicine, Philadelphia, PA, USA
| | - Dani S Bassett
- Department of Bioengineering, University of Pennsylvania, 209 South 33rd Street, Philadelphia, PA, 19104, USA, Department of Physics and Astronomy, University of Pennsylvania, 209 South 33rd Street, Philadelphia, PA, 19104, USA, Department of Electrical and Systems Engineering, University of Pennsylvania, 200 South 33rd Street, Philadelphia, PA, 19104, USA and Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, USA
| | - Theodore D Satterthwaite
- Department of Psychiatry, Perelman School of Medicine, Philadelphia, PA, USA and Penn Lifespan Informatics and Neuroimaging Center, Philadelphia, PA, USA
| | - Armin Raznahan
- Section on Developmental Neurogenomics, National Institutes of Mental Health, Bethesda, MD, USA
| | - Russell T Shinohara
- The Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
4
|
Barton S, Broad Z, Ortiz-Barrientos D, Donovan D, Lefevre J. Hypergraphs and centrality measures identifying key features in gene expression data. Math Biosci 2023; 366:109089. [PMID: 37914024 DOI: 10.1016/j.mbs.2023.109089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 10/16/2023] [Accepted: 10/18/2023] [Indexed: 11/03/2023]
Abstract
Multidisciplinary approaches can significantly advance our understanding of complex systems. For instance, gene co-expression networks align prior knowledge of biological systems with studies in graph theory, emphasising pairwise gene to gene interactions. In this paper, we extend these ideas, promoting hypergraphs as an investigative tool for studying multi-way interactions in gene expression data. Additional freedoms are achieved by representing individual genes with hyperedges, and simultaneously testing each gene against many features/vertices. Further gene/hyperedge interactions can be captured and explored using the line graph representations, a technique that reduces the complexity of dense hypergraphs. Such an approach provides access to graph centrality measures, which identifies salient features within a data set. For instance dominant or hub-like hyperedges, leading to key knowledge on gene expression. The validity of this approach is established through the study of gene expression data for the plant species Senecio lautus and results will be interpreted within this biological setting.
Collapse
Affiliation(s)
- Samuel Barton
- School of Mathematics and Physics, ARC Centre of Excellence, Plant Success in Nature and Agriculture, University of Queensland, Brisbane, 4072, Australia.
| | - Zoe Broad
- School of the Environment, ARC Centre of Excellence, Plant Success in Nature and Agriculture, University of Queensland, Brisbane, 4072, Australia
| | - Daniel Ortiz-Barrientos
- School of the Environment, ARC Centre of Excellence, Plant Success in Nature and Agriculture, University of Queensland, Brisbane, 4072, Australia
| | - Diane Donovan
- School of Mathematics and Physics, ARC Centre of Excellence, Plant Success in Nature and Agriculture, University of Queensland, Brisbane, 4072, Australia
| | - James Lefevre
- School of Mathematics and Physics, ARC Centre of Excellence, Plant Success in Nature and Agriculture, University of Queensland, Brisbane, 4072, Australia
| |
Collapse
|
5
|
Ma Z, Davis SW, Ho YY. Flexible copula model for integrating correlated multi-omics data from single-cell experiments. Biometrics 2022. [PMID: 35622236 DOI: 10.1111/biom.13701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 05/18/2022] [Indexed: 11/27/2022]
Abstract
With recent advances in technologies to profile multi-omics data at the single-cell level, integrative multi-omics data analysis has been increasingly popular. It is increasingly common that information such as methylation changes, chromatin accessibility, and gene expression are jointly collected in a single-cell experiment. In biomedical studies, it is often of interest to study the associations between various data types and to examine how these associations might change according to other factors such as cell types and gene regulatory components. However, since each data type usually has a distinct marginal distribution, joint analysis of these changes of associations using multi-omics data is statistically challenging. In this paper, we propose a flexible copula-based framework to model covariate-dependent correlation structures independent of their marginals. In addition, the proposed approach could jointly combine a wide variety of univariate marginal distributions, either discrete or continuous, including the class of zero-inflated distributions. The performance of the proposed framework is demonstrated through a series of simulation studies. Finally, it is applied to a set of experimental data to investigate the dynamic relationship between single-cell RNA-sequencing, chromatin accessibility, and DNA methylation at different germ layers during mouse gastrulation. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Zichen Ma
- Department of Public Health Sciences, Clemson University, Clemson, SC, USA
| | - Shannon W Davis
- Department of Biological Sciences, University of South Carolina, Columbia, SC, USA
| | - Yen-Yi Ho
- Department of Statistics, University of South Carolina, Columbia, SC, USA
| |
Collapse
|
6
|
Li L, Zeng J, Zhang X. Generalized Liquid Association Analysis for Multimodal Data Integration. J Am Stat Assoc 2022; 118:1984-1996. [PMID: 38099062 PMCID: PMC10720690 DOI: 10.1080/01621459.2021.2024437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 12/27/2021] [Indexed: 10/19/2022]
Abstract
Multimodal data are now prevailing in scientific research. One of the central questions in multimodal integrative analysis is to understand how two data modalities associate and interact with each other given another modality or demographic variables. The problem can be formulated as studying the associations among three sets of random variables, a question that has received relatively less attention in the literature. In this article, we propose a novel generalized liquid association analysis method, which offers a new and unique angle to this important class of problems of studying three-way associations. We extend the notion of liquid association of Li (2002) from the univariate setting to the sparse, multivariate, and high-dimensional setting. We establish a population dimension reduction model, transform the problem to sparse Tucker decomposition of a three-way tensor, and develop a higher-order orthogonal iteration algorithm for parameter estimation. We derive the non-asymptotic error bound and asymptotic consistency of the proposed estimator, while allowing the variable dimensions to be larger than and diverge with the sample size. We demonstrate the efficacy of the method through both simulations and a multimodal neuroimaging application for Alzheimer's disease research.
Collapse
Affiliation(s)
- Lexin Li
- University of California at Berkeley
| | | | | |
Collapse
|
7
|
Shokati Eshkiki Z, Khayer N, Talebi A, Karbalaei R, Akbari A. Novel insight into pancreatic adenocarcinoma pathogenesis using liquid association analysis. BMC Med Genomics 2022; 15:30. [PMID: 35180880 PMCID: PMC8855560 DOI: 10.1186/s12920-022-01174-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Accepted: 02/01/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Pancreatic ductal adenocarcinoma (PDAC) is a lethal malignancy associated with a poor prognosis. High-throughput disease-related-gene expression data provide valuable information on gene interaction, which consequently lead to deeper insight about pathogenesis. The co-expression analysis is a common approach that is used to investigate gene interaction. However, such an approach solely is inadequate to reveal the complexity of the gene interaction. The three-way interaction model is known as a novel approach applied to decode the complex relationship between genes. METHODS In the current study, the liquid association method was used to capture the statistically significant triplets involved in the PDAC pathogenesis. Subsequently, gene set enrichment and gene regulatory network analyses were performed to trace the biological relevance of the statistically significant triplets. RESULTS The results of the current study suggest that "response to estradiol" and "Regulation of T-cell proliferation" are two critical biological processes that may be associated with the PDAC pathogenesis. Additionally, we introduced six switch genes, namely Lamc2, Klk1, Nqo1, Aox1, Tspan1, and Cxcl12, which might be involved in PDAC triggering. CONCLUSION In the current study, for the first time, the critical genes and pathways involved in the PDAC pathogenesis were investigated using the three-way interaction approach. As a result, two critical biological processes, as well as six potential biomarkers, were suggested that might be involved in the PDAC triggering. Surprisingly, strong evidence for the biological relevance of our results can be found in the literature.
Collapse
Affiliation(s)
- Zahra Shokati Eshkiki
- Alimentary Tract Research Center, Clinical Sciences Research Institute, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | - Nasibeh Khayer
- Skull Base Research Center, The Five Senses Health Institute, Iran University of Medical Sciences, Tehran, Iran.
| | - Atefeh Talebi
- Colorectal Research Center, Iran University of Medical Sciences, Tehran, Iran
| | - Reza Karbalaei
- Department of Psychology and Neuroscience Program, Temple University, Philadelphia, PA, USA
| | - Abolfazl Akbari
- Colorectal Research Center, Iran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
8
|
Khayer N, Jalessi M, Jahanbakhshi A, Tabib Khooei A, Mirzaie M. Nkx3-1 and Fech genes might be switch genes involved in pituitary non-functioning adenoma invasiveness. Sci Rep 2021; 11:20943. [PMID: 34686726 PMCID: PMC8536755 DOI: 10.1038/s41598-021-00431-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 10/12/2021] [Indexed: 12/12/2022] Open
Abstract
Non-functioning pituitary adenomas (NFPAs) are typical pituitary macroadenomas in adults associated with increased mortality and morbidity. Although pituitary adenomas are commonly considered slow-growing benign brain tumors, numerous of them possess an invasive nature. Such tumors destroy sella turcica and invade the adjacent tissues such as the cavernous sinus and sphenoid sinus. In these cases, the most critical obstacle for complete surgical removal is the high risk of damaging adjacent vital structures. Therefore, the development of novel therapeutic strategies for either early diagnosis through biomarkers or medical therapies to reduce the recurrence rate of NFPAs is imperative. Identification of gene interactions has paved the way for decoding complex molecular mechanisms, including disease-related pathways, and identifying the most momentous genes involved in a specific disease. Currently, our knowledge of the invasion of the pituitary adenoma at the molecular level is not sufficient. The current study aimed to identify critical biomarkers and biological pathways associated with invasiveness in the NFPAs using a three-way interaction model for the first time. In the current study, the Liquid association method was applied to capture the statistically significant triplets involved in NFPAs invasiveness. Subsequently, Random Forest analysis was applied to select the most important switch genes. Finally, gene set enrichment (GSE) and gene regulatory network (GRN) analyses were applied to trace the biological relevance of the statistically significant triplets. The results of this study suggest that "mRNA processing" and "spindle organization" biological processes are important in NFAPs invasiveness. Specifically, our results suggest Nkx3-1 and Fech as two switch genes in NFAPs invasiveness that may be potential biomarkers or target genes in this pathology.
Collapse
Affiliation(s)
- Nasibeh Khayer
- Skull Base Research Center, The Five Senses Health Institute, Iran University of Medical Sciences, Tehran, Iran
| | - Maryam Jalessi
- Skull Base Research Center, The Five Senses Health Institute, Iran University of Medical Sciences, Tehran, Iran.
- ENT and Head & Neck Research Center and Department, Hazrat Rasoul Hospital, Iran University of Medical Sciences, Tehran, Iran.
| | - Amin Jahanbakhshi
- Skull Base Research Center, The Five Senses Health Institute, Iran University of Medical Sciences, Tehran, Iran
- Neurology Department, Hazrat Rasoul Hospital, Iran University of Medical Sciences, Tehran, Iran
| | - Alireza Tabib Khooei
- Neurology Department, Hazrat Rasoul Hospital, Iran University of Medical Sciences, Tehran, Iran
| | - Mehdi Mirzaie
- Department of Applied Mathematics, Faculty of Mathematical Sciences, Tarbiat Modares University, Tehran, Iran.
| |
Collapse
|
9
|
Cao X, Pounds S. Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis. BMC Bioinformatics 2021; 22:207. [PMID: 33882829 PMCID: PMC8059024 DOI: 10.1186/s12859-021-04110-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 03/30/2021] [Indexed: 11/23/2022] Open
Abstract
Background Identifying sets of related genes (gene sets) that are empirically associated with a treatment or phenotype often yields valuable biological insights. Several methods effectively identify gene sets in which individual genes have simple monotonic relationships with categorical, quantitative, or censored event-time variables. Some distance-based methods, such as distance correlations, may detect complex non-monotone associations of a gene-set with a quantitative variable that elude other methods. However, the distance correlations have yet to be generalized to associate gene-sets with categorical and censored event-time endpoints. Also, there is a need to determine which genes empirically drive the significance of an association of a gene set with an endpoint. Results We develop gene-set distance analysis (GSDA) by generalizing distance correlations to evaluate the association of a gene set with categorical and censored event-time variables. We also develop a backward elimination procedure to identify a subset of genes that empirically drive significant associations. In simulation studies, GSDA more effectively identified complex non-monotone gene-set associations than did six other published methods. In the analysis of a pediatric acute myeloid leukemia (AML) data set, GSDA was the only method to discover that event-free survival (EFS) was associated with the 56-gene AML pathway gene-set, narrow that result down to 5 genes, and confirm the association of those 5 genes with EFS in a separate validation cohort. These results indicate that GSDA effectively identifies and characterizes complex non-monotonic gene-set associations that are missed by other methods. Conclusion GSDA is a powerful and flexible method to detect gene-set association with categorical, quantitative, or censored event-time variables, especially to detect complex non-monotonic gene-set associations. Available at https://CRAN.R-project.org/package=GSDA. Supplementary information The online version contains supplementary material available at 10.1186/s12859-021-04110-x.
Collapse
Affiliation(s)
- Xueyuan Cao
- Department of Acute and Tertiary Care, University of Tennessee Health Science Center, Memphis, 38163, USA
| | - Stan Pounds
- Department of Biostatistics, St Jude Children's Research Hospital, Memphis, 38105, USA.
| |
Collapse
|
10
|
Yang Z, Ho YY. Modeling dynamic correlation in zero-inflated bivariate count data with applications to single-cell RNA sequencing data. Biometrics 2021; 78:766-776. [PMID: 33720414 PMCID: PMC8477913 DOI: 10.1111/biom.13457] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 03/03/2021] [Accepted: 03/08/2021] [Indexed: 12/13/2022]
Abstract
Interactions between biological molecules in a cell are tightly coordinated and often highly dynamic. As a result of these varying signaling activities, changes in gene coexpression patterns could often be observed. The advancements in next‐generation sequencing technologies bring new statistical challenges for studying these dynamic changes of gene coexpression. In recent years, methods have been developed to examine genomic information from individual cells. Single‐cell RNA sequencing (scRNA‐seq) data are count‐based, and often exhibit characteristics such as overdispersion and zero inflation. To explore the dynamic dependence structure in scRNA‐seq data and other zero‐inflated count data, new approaches are needed. In this paper, we consider overdispersion and zero inflation in count outcomes and propose a ZEro‐inflated negative binomial dynamic COrrelation model (ZENCO). The observed count data are modeled as a mixture of two components: success amplifications and dropout events in ZENCO. A latent variable is incorporated into ZENCO to model the covariate‐dependent correlation structure. We conduct simulation studies to evaluate the performance of our proposed method and to compare it with existing approaches. We also illustrate the implementation of our proposed approach using scRNA‐seq data from a study of minimal residual disease in melanoma.
Collapse
Affiliation(s)
- Zhen Yang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| | - Yen-Yi Ho
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| |
Collapse
|
11
|
Wu G, Ge L, Zhao N, Liu F, Shi Z, Zheng N, Zhou D, Jiang X, Halverson L, Xie B. Environment dependent microbial co-occurrences across a cyanobacterial bloom in a freshwater lake. Environ Microbiol 2020; 23:327-339. [PMID: 33185973 DOI: 10.1111/1462-2920.15315] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 10/28/2020] [Accepted: 11/09/2020] [Indexed: 11/29/2022]
Abstract
Microbial taxon-taxon co-occurrences may directly or indirectly reflect the potential relationships between the members within a microbial community. However, to what extent and the specificity by which these co-occurrences are influenced by environmental factors remains unclear. In this report, we evaluated how the dynamics of microbial taxon-taxon co-occurrence is associated with the changes of environmental factors in Nan Lake at Wuhan city, China with a Modified Liquid Association method. We were able to detect more than 1000 taxon-taxon co-occurrences highly correlated with one or more environmental factors across a phytoplankton bloom using 16S rRNA gene amplicon community profiles. These co-occurrences, referred to as environment dependent co-occurrences (ED_co-occurrences), delineate a unique network in which a taxon-taxon pair exhibits specific, and potentially dynamic correlations with an environmental parameter, while the individual relative abundance of each may not. Microcystis involved ED_co-occurrences are in important topological positions in the network, suggesting relationships between the bloom dominant species and other taxa could play a role in the interplay of microbial community and environment across various bloom stages. Our results may broaden our understanding of the response of a microbial community to the environment, particularly at the level of microbe-microbe associations.
Collapse
Affiliation(s)
- Gang Wu
- School of Life Sciences, Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, 430079, China
| | - Leixin Ge
- School of Life Sciences, Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, 430079, China
| | - Na Zhao
- School of Life Sciences, Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, 430079, China
| | - Fei Liu
- School of Life Sciences, Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, 430079, China
| | - Zunji Shi
- School of Life Sciences, Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, 430079, China
| | - Ningning Zheng
- School of Life Sciences, Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, 430079, China
| | - Dan Zhou
- School of Life Sciences, Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, 430079, China.,School of Biological Sciences, Guizhou Normal College, Guiyang, Guizhou, 550018, China
| | - Xingpeng Jiang
- School of Computer, Central China Normal University, Wuhan, 430079, China
| | - Larry Halverson
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, Iowa, USA
| | - Bo Xie
- School of Life Sciences, Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, 430079, China
| |
Collapse
|
12
|
Rps27a might act as a controller of microglia activation in triggering neurodegenerative diseases. PLoS One 2020; 15:e0239219. [PMID: 32941527 PMCID: PMC7498011 DOI: 10.1371/journal.pone.0239219] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Accepted: 09/01/2020] [Indexed: 01/10/2023] Open
Abstract
Neurodegenerative diseases (NDDs) are increasing serious menaces to human health in the recent years. Despite exhibiting different clinical phenotypes and selective neuronal loss, there are certain common features in these disorders, suggesting the presence of commonly dysregulated pathways. Identifying causal genes and dysregulated pathways can be helpful in providing effective treatment in these diseases. Interestingly, in spite of the considerable researches on NDDs, to the best of our knowledge, no dysregulated genes and/or pathways were reported in common across all the major NDDs so far. In this study, for the first time, we have applied the three-way interaction model, as an approach to unravel sophisticated gene interactions, to trace switch genes and significant pathways that are involved in six major NDDs. Subsequently, a gene regulatory network was constructed to investigate the regulatory communication of statistically significant triplets. Finally, KEGG pathway enrichment analysis was applied to find possible common pathways. Because of the central role of neuroinflammation and immune system responses in both pathogenic and protective mechanisms in the NDDs, we focused on immune genes in this study. Our results suggest that "cytokine-cytokine receptor interaction" pathway is enriched in all of the studied NDDs, while "osteoclast differentiation" and "natural killer cell mediated cytotoxicity" pathways are enriched in five of the NDDs each. The results of this study indicate that three pathways that include "osteoclast differentiation", "natural killer cell mediated cytotoxicity" and "cytokine-cytokine receptor interaction" are common in five, five and six NDDs, respectively. Additionally, our analysis showed that Rps27a as a switch gene, together with the gene pair {Il-18, Cx3cl1} form a statistically significant and biologically relevant triplet in the major NDDs. More specifically, we suggested that Cx3cl1 might act as a potential upstream regulator of Il-18 in microglia activation, and in turn, might be controlled with Rps27a in triggering NDDs.
Collapse
|
13
|
Lu J, Lu Y, Ding Y, Xiao Q, Liu L, Cai Q, Kong Y, Bai Y, Yu T. DNLC: differential network local consistency analysis. BMC Bioinformatics 2019; 20:489. [PMID: 31874600 PMCID: PMC6929334 DOI: 10.1186/s12859-019-3046-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 08/21/2019] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND The biological network is highly dynamic. Functional relations between genes can be activated or deactivated depending on the biological conditions. On the genome-scale network, subnetworks that gain or lose local expression consistency may shed light on the regulatory mechanisms related to the changing biological conditions, such as disease status or tissue developmental stages. RESULTS In this study, we develop a new method to select genes and modules on the existing biological network, in which local expression consistency changes significantly between clinical conditions. The method is called DNLC: Differential Network Local Consistency. In simulations, our algorithm detected artificially created local consistency changes effectively. We applied the method on two publicly available datasets, and the method detected novel genes and network modules that were biologically plausible. CONCLUSIONS The new method is effective in finding modules in which the gene expression consistency change between clinical conditions. It is a useful tool that complements traditional differential expression analyses to make discoveries from gene expression data. The R package is available at https://cran.r-project.org/web/packages/DNLC.
Collapse
Affiliation(s)
- Jianwei Lu
- School of Software Engineering, Tongji University, Shanghai, China
- Institute of Advanced Translational Medicine, Tongji University, Shanghai, China
| | - Yao Lu
- School of Software Engineering, Tongji University, Shanghai, China
| | - Yusheng Ding
- School of Software Engineering, Tongji University, Shanghai, China
| | - Qingyang Xiao
- Department of Environmental Health, Emory University, Atlanta, GA USA
| | - Linqing Liu
- School of Software Engineering, Tongji University, Shanghai, China
| | - Qingpo Cai
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA USA
| | - Yunchuan Kong
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA USA
| | - Yun Bai
- Department of Pharmaceutical Sciences, School of Pharmacy, Philadelphia College of Osteopathic Medicine, Georgia Campus, Suwanee, GA USA
| | - Tianwei Yu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA USA
| |
Collapse
|
14
|
A hypergraph-based method for large-scale dynamic correlation study at the transcriptomic scale. BMC Genomics 2019; 20:397. [PMID: 31117943 PMCID: PMC6530038 DOI: 10.1186/s12864-019-5787-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 05/09/2019] [Indexed: 12/22/2022] Open
Abstract
Background The biological regulatory system is highly dynamic. Correlations between functionally related genes change over different biological conditions, which are often unobserved in the data. At the gene level, the dynamic correlations result in three-way gene interactions involving a pair of genes that change correlation, and a third gene that reflects the underlying cellular conditions. This type of ternary relation can be quantified by the Liquid Association statistic. Studying these three-way interactions at the gene triplet level have revealed important regulatory mechanisms in the biological system. Currently, due to the extremely large amount of possible combinations of triplets within a high-throughput gene expression dataset, no method is available to examine the ternary relationship at the biological system level and formally address the false discovery issue. Results Here we propose a new method, Hypergraph for Dynamic Correlation (HDC), to construct module-level three-way interaction networks. The method is able to present integrative uniform hypergraphs to reflect the global dynamic correlation pattern in the biological system, providing guidance to down-stream gene triplet-level analyses. To validate the method’s ability, we conducted two real data experiments using a melanoma RNA-seq dataset from The Cancer Genome Atlas (TCGA) and a yeast cell cycle dataset. The resulting hypergraphs are clearly biologically plausible, and suggest novel relations relevant to the biological conditions in the data. Conclusions We believe the new approach provides a valuable alternative method to analyze omics data that can extract higher order structures. The software is at https://github.com/yunchuankong/HypergraphDynamicCorrelation. Electronic supplementary material The online version of this article (10.1186/s12864-019-5787-x) contains supplementary material, which is available to authorized users.
Collapse
|
15
|
Ai D, Li X, Pan H, Chen J, Cram JA, Xia LC. Explore mediated co-varying dynamics in microbial community using integrated local similarity and liquid association analysis. BMC Genomics 2019; 20:185. [PMID: 30967122 PMCID: PMC6456937 DOI: 10.1186/s12864-019-5469-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Discovering the key microbial species and environmental factors of microbial community and characterizing their relationships with other members are critical to ecosystem studies. The microbial co-occurrence patterns across a variety of environmental settings have been extensively characterized. However, previous studies were limited by their restriction toward pairwise relationships, while there was ample evidence of third-party mediated co-occurrence in microbial communities. METHODS We implemented and applied the triplet-based liquid association analysis in combination with the local similarity analysis procedure to microbial ecology data. We developed an intuitive scheme to visualize those complex triplet associations along with pairwise correlations. Using a time series from the marine microbial ecosystem as example, we identified pairs of operational taxonomic units (OTUs) where the strength of their associations appeared to relate to the values of a third "mediator" variable. These "mediator" variables appear to modulate the associations between pairs of bacteria. RESULTS Using this analysis, we were able to assess the OTUs' ability to regulate its functional partners in the community, typically not manifested in the pairwise correlation patterns. For example, we identified Flavobacteria as a multifaceted player in the marine microbial ecosystem, and its clades were involved in mediating other OTU pairs. By contrast, SAR11 clades were not active mediators of the community, despite being abundant and highly correlated with other OTUs. Our results suggested that Flavobacteria are more likely to respond to situations where particles and unusual sources of dissolved organic material are prevalent, such as after a plankton bloom. On the other hand, SAR11s are oligotrophic chemoheterotrophs with inflexible metabolisms, and their relationships with other organisms may be less governed by environmental or biological factors. CONCLUSIONS By integrating liquid association with local similarity analysis to explore the mediated co-varying dynamics, we presented a novel perspective and a useful toolkit to analyze and interpret time series data from microbial community. Our augmented association network analysis is thus more representative of the true underlying dynamic structure of the microbial community. The analytic software in this study was implemented as new functionalities of the ELSA (Extended local similarity analysis) tool, which is available for free download ( http://bitbucket.org/charade/elsa ).
Collapse
Affiliation(s)
- Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Xueyuan Road, Haidian District, Beijing, 100001 China
| | - Xiaoxin Li
- School of Mathematics and Physics, University of Science and Technology Beijing, Xueyuan Road, Haidian District, Beijing, 100001 China
| | - Hongfei Pan
- School of Mathematics and Physics, University of Science and Technology Beijing, Xueyuan Road, Haidian District, Beijing, 100001 China
| | - Jiamin Chen
- Department of Medicine, Stanford University School of Medicine, 269 Campus Dr., Stanford, CA 94305 USA
| | - Jacob A. Cram
- Center for Environmental Science, University of Maryland, Cambridge, MA 21613 USA
| | - Li C. Xia
- Department of Medicine, Stanford University School of Medicine, 269 Campus Dr., Stanford, CA 94305 USA
| |
Collapse
|
16
|
Kinzy TG, Starr TK, Tseng GC, Ho YY. Meta-analytic framework for modeling genetic coexpression dynamics. Stat Appl Genet Mol Biol 2019; 18:/j/sagmb.ahead-of-print/sagmb-2017-0052/sagmb-2017-0052.xml. [DOI: 10.1515/sagmb-2017-0052] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
Methods for exploring genetic interactions have been developed in an attempt to move beyond single gene analyses. Because biological molecules frequently participate in different processes under various cellular conditions, investigating the changes in gene coexpression patterns under various biological conditions could reveal important regulatory mechanisms. One of the methods for capturing gene coexpression dynamics, named liquid association (LA), quantifies the relationship where the coexpression between two genes is modulated by a third “coordinator” gene. This LA measure offers a natural framework for studying gene coexpression changes and has been applied increasingly to study regulatory networks among genes. With a wealth of publicly available gene expression data, there is a need to develop a meta-analytic framework for LA analysis. In this paper, we incorporated mixed effects when modeling correlation to account for between-studies heterogeneity. For statistical inference about LA, we developed a Markov chain Monte Carlo (MCMC) estimation procedure through a Bayesian hierarchical framework. We evaluated the proposed methods in a set of simulations and illustrated their use in two collections of experimental data sets. The first data set combined 10 pancreatic ductal adenocarcinoma gene expression studies to determine the role of possible coordinator gene USP9X in the Hippo pathway. The second experimental data set consisted of 907 gene expression microarray Escherichia coli experiments from multiple studies publicly available through the Many Microbe Microarray Database website (http://m3d.bu.edu/) and examined genes that coexpress with serA in the presence of coordinator gene Lrp.
Collapse
|
17
|
Khayer N, Mirzaie M, Marashi SA, Rezaei-Tavirani M, Goshadrou F. Three-way interaction model with switching mechanism as an effective strategy for tracing functionally-related genes. Expert Rev Proteomics 2018; 16:161-169. [PMID: 30556756 DOI: 10.1080/14789450.2019.1559734] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Introduction: Identification of functionally-related genes is an important step in understanding biological systems. The most popular strategy to infer functional dependence is to study pairwise correlations between gene expression levels. However, certain functionally-related genes may have a low expression correlation due to their nonlinear interactions. The use of a three-way interaction (3WI) model with switching mechanism (SM) is a relatively new strategy to trace functionally-related genes. The 3WI model traces the dynamic and nonlinear nature of the co-expression relationship of two genes by introducing their link to the expression level of a third gene. Areas covered: In this paper, we reviewed a variety of existing methods for tracing the 3WIs. Furthermore, we provide a comprehensive review of the previous biological studies based on 3WI models. Expert commentary: Comparison of features of these methods indicates that the modified liquid association algorithm has the best efficiency for tracing 3WI between others. The limited number of biological studies based on the 3WI suggests that high computational demand of the available algorithms is a major challenge to apply this approach for analyzing high-throughput omics data.
Collapse
Affiliation(s)
- Nasibeh Khayer
- a Department of Basic Sciences, Faculty of Paramedical Sciences , Shahid Beheshti University of Medical Sciences , Tehran , Iran
| | - Mehdi Mirzaie
- b Department of Applied Mathematics, Faculty of Mathematical Sciences , Tarbiat Modares University , Tehran , Iran
| | - Sayed-Amir Marashi
- c Department of Biotechnology , College of Science, University of Tehran , Tehran , Iran
| | - Mostafa Rezaei-Tavirani
- d Proteomics Research Center , Shahid Beheshti University of Medical Sciences , Tehran , Iran
| | - Fatemeh Goshadrou
- a Department of Basic Sciences, Faculty of Paramedical Sciences , Shahid Beheshti University of Medical Sciences , Tehran , Iran
| |
Collapse
|
18
|
Yu T. A new dynamic correlation algorithm reveals novel functional aspects in single cell and bulk RNA-seq data. PLoS Comput Biol 2018; 14:e1006391. [PMID: 30080856 PMCID: PMC6095616 DOI: 10.1371/journal.pcbi.1006391] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Revised: 08/16/2018] [Accepted: 07/24/2018] [Indexed: 01/21/2023] Open
Abstract
Dynamic correlations are pervasive in high-throughput data. Large numbers of gene pairs can change their correlation patterns in response to observed/unobserved changes in physiological states. Finding changes in correlation patterns can reveal important regulatory mechanisms. Currently there is no method that can effectively detect global dynamic correlation patterns in a dataset. Given the challenging nature of the problem, the currently available methods use genes as surrogate measurements of physiological states, which cannot faithfully represent true underlying biological signals. In this study we develop a new method that directly identifies strong latent dynamic correlation signals from the data matrix, named DCA: Dynamic Correlation Analysis. At the center of the method is a new metric for the identification of pairs of variables that are highly likely to be dynamically correlated, without knowing the underlying physiological states that govern the dynamic correlation. We validate the performance of the method with extensive simulations. We applied the method to three real datasets: a single cell RNA-seq dataset, a bulk RNA-seq dataset, and a microarray gene expression dataset. In all three datasets, the method reveals novel latent factors with clear biological meaning, bringing new insights into the data. Dynamic correlation is an important area in expression data. However it hasn’t received much attention because of the lack of effective methods that can unravel the complex relationship. Here we describe a new method that represents a substantial improvement over existing approaches. It achieves the goal of efficiently finding patterns of dynamic correlation in RNA-seq data, as well as detecting biological functions associated with the dynamic correlation patterns. Unlike traditional methods that focus on first-order structures, linear or nonlinear, our method finds second-order patterns that bring insights into the regulations of the complex system. Some of the interesting discoveries by the new method, such as immunological functions of some intestinal epithelial cells, are validated by recent biological publications.
Collapse
Affiliation(s)
- Tianwei Yu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, United States of America
- * E-mail:
| |
Collapse
|
19
|
Xu X, Wang M, Li L, Che R, Li P, Pei L, Li H. Genome-wide trait-trait dynamics correlation study dissects the gene regulation pattern in maize kernels. BMC PLANT BIOLOGY 2017; 17:163. [PMID: 29037150 PMCID: PMC5644097 DOI: 10.1186/s12870-017-1119-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 10/09/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND Dissecting the genetic basis and regulatory mechanisms for the biosynthesis and accumulation of nutrients in maize could lead to the improved nutritional quality of this crop. Gene expression is regulated at the genomic, transcriptional, and post-transcriptional levels, all of which can produce diversity among traits. However, the expression of most genes connected with a particular trait usually does not have a direct association with the variation of that trait. In addition, expression profiles of genes involved in a single pathway may vary as the intrinsic cellular state changes. To work around these issues, we utilized a statistical method, liquid association (LA) to investigate the complex pattern of gene regulation in maize kernels. RESULTS We applied LA to the expression profiles of 28,769 genes to dissect dynamic trait-trait correlation patterns in maize kernels. Among the 1000 LA pairs (LAPs) with the largest LA scores, 686 LAPs were identified conditional correlation. We also identified 830 and 215 LA-scouting leaders based on the positive and negative LA scores, which were significantly enriched for some biological processes and molecular functions. Our analysis of the dynamic co-expression patterns in the carotene biosynthetic pathway clearly indicated the important role of lcyE, CYP97A, ZEP1, and VDE in this pathway, which may change the direction of carotene biosynthesis by controlling the influx and efflux of the substrate. The dynamic trait-trait correlation patterns between gene expression and oil concentration in the fatty acid metabolic pathway and its complex regulatory network were also assessed. 23 of 26 oil-associated genes were correlated with oil concentration conditioning on 580 LA-scoutinggenes, and 5% of these LA-scouting genes were annotated as enzymes in the oil metabolic pathway. CONCLUSIONS By focusing on the carotenoid and oil biosynthetic pathways in maize, we showed that a genome-wide LA analysis provides a novel and effective way to detect transcriptional regulatory relationships. This method will help us understand the biological role of maize kernel genes and will benefit maize breeding programs.
Collapse
Affiliation(s)
- Xiuqin Xu
- School of Biological and Science Technology, University of Jinan, Jinan, 250022 China
| | - Min Wang
- National Maize Improvement Center of China, Key Laboratory of Crop Genomics and Genetic Improvement, China Agricultural University, Beijing, 100193 China
| | - Lianbo Li
- School of Biological and Science Technology, University of Jinan, Jinan, 250022 China
| | - Ronghui Che
- School of Biological and Science Technology, University of Jinan, Jinan, 250022 China
| | - Peng Li
- School of Biological and Science Technology, University of Jinan, Jinan, 250022 China
| | - Laming Pei
- School of Biological and Science Technology, University of Jinan, Jinan, 250022 China
| | - Hui Li
- School of Biological and Science Technology, University of Jinan, Jinan, 250022 China
| |
Collapse
|
20
|
Khayer N, Marashi SA, Mirzaie M, Goshadrou F. Three-way interaction model to trace the mechanisms involved in Alzheimer's disease transgenic mice. PLoS One 2017; 12:e0184697. [PMID: 28934252 PMCID: PMC5608283 DOI: 10.1371/journal.pone.0184697] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 08/29/2017] [Indexed: 11/19/2022] Open
Abstract
Alzheimer's disease (AD) is the most common cause for dementia in human. Currently, more than 46 million people in the world suffer from AD and it is estimated that by 2050 this number increases to more than 131 million. AD is considered as a complex disease. Therefore, understanding the mechanism of AD is a universal challenge. Nowadays, a huge number of disease-related high-throughput “omics” datasets are freely available. Such datasets contain valuable information about disease-related pathways and their corresponding gene interactions. In the present work, a three-way interaction model is used as a novel approach to understand AD-related mechanisms. This model can trace the dynamic nature of co-expression relationship between two genes by introducing their link to a third gene. Apparently, such relationships cannot be traced by the classical two-way interaction model. Liquid association method was applied to capture the statistically significant triplets which are involved in three-way interaction. Subsequently, gene set enrichment analysis (GSEA) and gene regulatory network (GRN) inference were applied to analyze the biological relevance of the statistically significant triplets. The results of this study suggest that the innate immunity processes are important in AD. Specifically, our results suggest that H2-Ob as the switching gene and the gene pair {Csf1r, Milr1} form a statistically significant and biologically relevant triplet, which may play an important role in AD. We propose that the homeostasis-related link between mast cells and microglia is presumably controlled with H2-Ob expression levels as a switching gene.
Collapse
Affiliation(s)
- Nasibeh Khayer
- Department of Basic Sciences, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Sayed-Amir Marashi
- Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
- * E-mail:
| | - Mehdi Mirzaie
- Department of Applied Mathematics, Faculty of Mathematical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Fatemeh Goshadrou
- Department of Basic Sciences, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
21
|
Wang L, Liu S, Ding Y, Yuan SS, Ho YY, Tseng GC. Meta-analytic framework for liquid association. Bioinformatics 2017; 33:2140-2147. [PMID: 28334340 PMCID: PMC6044323 DOI: 10.1093/bioinformatics/btx138] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 02/11/2017] [Accepted: 03/09/2017] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Although coexpression analysis via pair-wise expression correlation is popularly used to elucidate gene-gene interactions at the whole-genome scale, many complicated multi-gene regulations require more advanced detection methods. Liquid association (LA) is a powerful tool to detect the dynamic correlation of two gene variables depending on the expression level of a third variable (LA scouting gene). LA detection from single transcriptomic study, however, is often unstable and not generalizable due to cohort bias, biological variation and limited sample size. With the rapid development of microarray and NGS technology, LA analysis combining multiple gene expression studies can provide more accurate and stable results. RESULTS In this article, we proposed two meta-analytic approaches for LA analysis (MetaLA and MetaMLA) to combine multiple transcriptomic studies. To compensate demanding computing, we also proposed a two-step fast screening algorithm for more efficient genome-wide screening: bootstrap filtering and sign filtering. We applied the methods to five Saccharomyces cerevisiae datasets related to environmental changes. The fast screening algorithm reduced 98% of running time. When compared with single study analysis, MetaLA and MetaMLA provided stronger detection signal and more consistent and stable results. The top triplets are highly enriched in fundamental biological processes related to environmental changes. Our method can help biologists understand underlying regulatory mechanisms under different environmental exposure or disease states. AVAILABILITY AND IMPLEMENTATION A MetaLA R package, data and code for this article are available at http://tsenglab.biostat.pitt.edu/software.htm. CONTACT ctseng@pitt.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lin Wang
- School of Statistics, Capital University of Economics and Business, Fengtai, Beijing, China
| | - Silvia Liu
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ying Ding
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shin-sheng Yuan
- Institute of Statistical Science, Academia Sinica, Nankang, Taipei, Taiwan
| | - Yen-Yi Ho
- Department of Statistics, College of Arts and Sciences, University of South Carolina, Columbia, SC, USA
| | - George C Tseng
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
22
|
Cao X, Crews KR, Downing J, Lamba J, Pounds SB. CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints. BMC Bioinformatics 2016; 17:382. [PMID: 27766934 PMCID: PMC5073973 DOI: 10.1186/s12859-016-1217-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND As new technologies allow investigators to collect multiple forms of molecular data (genomic, epigenomic, transcriptomic, etc) and multiple endpoints on a clinical trial cohort, it will become necessary to effectively integrate all these data in a way that reliably identifies biologically important genes. METHODS We introduce CC-PROMISE as an integrated data analysis method that combines components of canonical correlation (CC) and projection onto the most interesting evidence (PROMISE). For each gene, CC-PROMISE first uses CC to compute scores that represent the association of two forms of molecular data with each other. Next, these scores are substituted into PROMISE to evaluate the statistical evidence that the molecular data show a biologically meaningful relationship with the endpoints. RESULTS CC-PROMISE shows outstanding performance in simulation studies and an example application involving pediatric leukemia. In simulation studies, CC-PROMISE controls the type I error (misleading significance) rate very near the nominal level across 100 distinct null settings in which no molecular-endpoint association exists. Also, CC-PROMISE has better statistical power than three other methods that control type I error in 396 of 400 (99 %) alternative settings for which a molecular-endpoint association is present; the power advantage of CC-PROMISE exceeds 30 % in 127 of the 400 (32 %) alternative settings. These advantages of CC-PROMISE are also observed in an example application. CONCLUSION CC-PROMISE very effectively identifies genes for which some form of molecular data shows a biologically meaningful association with multiple related endpoints. AVAILABILITY The R package CCPROMISE is currently available from www.stjuderesearch.org/site/depts/biostats/software .
Collapse
Affiliation(s)
- Xueyuan Cao
- Department of Biostatistics, St. Jude Children’s Research Hospital, 262 Danny Thomas Place, Memphis, 38105 USA
| | - Kristine R. Crews
- Department of Pharmaceutical Sciences, St. Jude Children’s Research Hospital, 262 Danny Thomas Place, Memphis, 38105 USA
| | - James Downing
- Department of Pathology, St. Jude Children’s Research Hospital, 262 Danny Thomas Place, Memphis, 38105 USA
| | - Jatinder Lamba
- Department of Pharmacotherapy and Translational Research, University of Florida, 1333 Center Drive, Gainesville, 32610 USA
| | - Stanley B. Pounds
- Department of Biostatistics, St. Jude Children’s Research Hospital, 262 Danny Thomas Place, Memphis, 38105 USA
| |
Collapse
|
23
|
Yuan H, Li Z, Tang NLS, Deng M. A network based covariance test for detecting multivariate eQTL in saccharomyces cerevisiae. BMC SYSTEMS BIOLOGY 2016; 10 Suppl 1:8. [PMID: 26818242 PMCID: PMC4895706 DOI: 10.1186/s12918-015-0245-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Background Expression quantitative trait locus (eQTL) analysis has been widely used to understand how genetic variations affect gene expressions in the biological systems. Traditional eQTL is investigated in a pair-wise manner in which one SNP affects the expression of one gene. In this way, some associated markers found in GWAS have been related to disease mechanism by eQTL study. However, in real life, biological process is usually performed by a group of genes. Although some methods have been proposed to identify a group of SNPs that affect the mean of gene expressions in the network, the change of co-expression pattern has not been considered. So we propose a process and algorithm to identify the marker which affects the co-expression pattern of a pathway. Considering two genes may have different correlations under different isoforms which is hard to detect by the linear test, we also consider the nonlinear test. Results When we applied our method to yeast eQTL dataset profiled under both the glucose and ethanol conditions, we identified a total of 166 modules, with each module consisting of a group of genes and one eQTL where the eQTL regulate the co-expression patterns of the group of genes. We found that many of these modules have biological significance. Conclusions We propose a network based covariance test to identify the SNP which affects the structure of a pathway. We also consider the nonlinear test as considering two genes may have different correlations under different isoforms which is hard to detect by linear test. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0245-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Huili Yuan
- LMAM, School of Mathematical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China.
| | - Zhenye Li
- LMAM, School of Mathematical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China.
| | - Nelson L S Tang
- Department of Chemical Pathology, Prince of Wales Hospital, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong, China.
| | - Minghua Deng
- LMAM, School of Mathematical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China. .,Center for Quantitative Biology, Peking University, Yiheyuan Road, Beijing, 100871, China. .,Center for Statistical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China.
| |
Collapse
|
24
|
Gunderson T, Ho YY. An efficient algorithm to explore liquid association on a genome-wide scale. BMC Bioinformatics 2014; 15:371. [PMID: 25431229 PMCID: PMC4255454 DOI: 10.1186/s12859-014-0371-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Accepted: 10/30/2014] [Indexed: 01/04/2023] Open
Abstract
Background The growing wealth of public available gene expression data has made the systemic studies of how genes interact in a cell become more feasible. Liquid association (LA) describes the extent to which coexpression of two genes may vary based on the expression level of a third gene (the controller gene). However, genome-wide application has been difficult and resource-intensive. We propose a new screening algorithm for more efficient processing of LA estimation on a genome-wide scale and apply its use to a Saccharomyces cerevisiae data set. Results On a test subset of the data, the fast screening algorithm achieved >99.8% agreement with the exhaustive search of LA values, while reduced run time by 81–93 %. Using a well-known yeast cell-cycle data set with 6,178 genes, we identified triplet combinations with significantly large LA values. In an exploratory gene set enrichment analysis, the top terms for the controller genes in these triplets with large LA values are involved in some of the most fundamental processes in yeast such as energy regulation, transportation, and sporulation. Conclusion In summary, in this paper we propose a novel, efficient algorithm to explore LA on a genome-wide scale and identified triplets of interest in cell cycle pathways using the proposed method in a yeast data set. A software package named fastLiquidAssociation for implementing the algorithm is available through http://www.bioconductor.org. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0371-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tina Gunderson
- Division of Biostatistics, School of Public Health, University of Minnesota, 420 Delaware St. S.E., MMC 303, Minneapolis, 55455, MN, USA.
| | - Yen-Yi Ho
- Division of Biostatistics, School of Public Health, University of Minnesota, 420 Delaware St. S.E., MMC 303, Minneapolis, 55455, MN, USA.
| |
Collapse
|
25
|
Wang L, Zheng W, Zhao H, Deng M. Statistical analysis reveals co-expression patterns of many pairs of genes in yeast are jointly regulated by interacting loci. PLoS Genet 2013; 9:e1003414. [PMID: 23555313 PMCID: PMC3610942 DOI: 10.1371/journal.pgen.1003414] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2012] [Accepted: 02/11/2013] [Indexed: 11/30/2022] Open
Abstract
Expression quantitative trait loci (eQTL) studies have generated large amounts of data in different organisms. The analyses of these data have led to many novel findings and biological insights on expression regulations. However, the role of epistasis in the joint regulation of multiple genes has not been explored. This is largely due to the computational complexity involved when multiple traits are simultaneously considered against multiple markers if an exhaustive search strategy is adopted. In this article, we propose a computationally feasible approach to identify pairs of chromosomal regions that interact to regulate co-expression patterns of pairs of genes. Our approach is built on a bivariate model whose covariance matrix depends on the joint genotypes at the candidate loci. We also propose a filtering process to reduce the computational burden. When we applied our method to a yeast eQTL dataset profiled under both the glucose and ethanol conditions, we identified a total of 225 and 224 modules, with each module consisting of two genes and two eQTLs where the two eQTLs epistatically regulate the co-expression patterns of the two genes. We found that many of these modules have biological interpretations. Under the glucose condition, ribosome biogenesis was co-regulated with the signaling and carbohydrate catabolic processes, whereas silencing and aging related genes were co-regulated under the ethanol condition with the eQTLs containing genes involved in oxidative stress response process. eQTL studies collect both gene expression and genotype data, and they are highly informative as to how genes regulate expressions. Although much progress has been made in the analysis of such data, most studies have considered one marker at a time. As a result, those markers with weak marginal yet strong interactive effects may not be inferred from these single-marker-based analyses. In this article, using joint expression patterns between two genes (versus one gene) as the primary phenotype, we propose a novel statistical method to conduct an exhaustive search for joint marker analysis. When our method is applied to a well-studied dataset, we were able to identify many novel features that were overlooked by existing methods. Our general strategy has general applicability to other scientific problems.
Collapse
Affiliation(s)
- Lin Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Wei Zheng
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- * E-mail: (HZ); (MD)
| | - Minghua Deng
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- LMAM, School of Mathematical Sciences, Peking University, Beijing, China
- Center for Statistical Science, Peking University, Beijing, China
- * E-mail: (HZ); (MD)
| |
Collapse
|
26
|
Qiu P, Zhang L. Identification of markers associated with global changes in DNA methylation regulation in cancers. BMC Bioinformatics 2012; 13 Suppl 13:S7. [PMID: 23320390 PMCID: PMC3426805 DOI: 10.1186/1471-2105-13-s13-s7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
DNA methylation exhibits different patterns in different cancers. DNA methylation rates at different genomic loci appear to be highly correlated in some samples but not in others. We call such phenomena conditional concordant relationships (CCRs). In this study, we explored DNA methylation patterns in 12 common cancers using data of 2434 patient samples collected by The Cancer Genome Atlas project. We developed an exploratory method to characterize CCRs in the methylation data and identified the 200 gene markers whose on-and-off statuses in DNA methylation are most significantly associated with drastic changes in CCRs throughout the genome. Clustering analysis of the methylation data of the 200 markers showed that they are tightly associated with cancer subtypes. We also generated a library of the significant CCRs that may be of interest to future studies of the regulation network of DNA methylation in cancer.
Collapse
Affiliation(s)
- Peng Qiu
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | |
Collapse
|