1
|
Mary-Huard T, Das S, Mukhopadhyay I, Robin S. Querying multiple sets of P-values through composed hypothesis testing. Bioinformatics 2021; 38:141-148. [PMID: 34478490 DOI: 10.1093/bioinformatics/btab592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 07/16/2021] [Accepted: 07/27/2021] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Combining the results of different experiments to exhibit complex patterns or to improve statistical power is a typical aim of data integration. The starting point of the statistical analysis often comes as a set of P-values resulting from previous analyses, that need to be combined flexibly to explore complex hypotheses, while guaranteeing a low proportion of false discoveries. RESULTS We introduce the generic concept of composed hypothesis, which corresponds to an arbitrary complex combination of simple hypotheses. We rephrase the problem of testing a composed hypothesis as a classification task and show that finding items for which the composed null hypothesis is rejected boils down to fitting a mixture model and classifying the items according to their posterior probabilities. We show that inference can be efficiently performed and provide a thorough classification rule to control for type I error. The performance and the usefulness of the approach are illustrated in simulations and on two different applications. The method is scalable, does not require any parameter tuning, and provided valuable biological insight on the considered application cases. AVAILABILITY AND IMPLEMENTATION The QCH methodology is available in the qch package hosted on CRAN. Additionally, R codes to reproduce the Einkorn example are available on the personal webpage of the first author: https://www6.inrae.fr/mia-paris/Equipes/Membres/Tristan-Mary-Huard. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tristan Mary-Huard
- Mathématiques et informatique appliqués (MIA)-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Paris 75231, France.,Génétique Quantitative et Evolution (GQE)-Le Moulon, Universite Paris-Saclay, INRAE, CNRS, AgroParisTech, Gif-sur-Yvette 91190, France
| | - Sarmistha Das
- Human Genetics Unit, Indian Statistical Institute, Kolkata 700108, India
| | | | - Stéphane Robin
- Mathématiques et informatique appliqués (MIA)-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Paris 75231, France.,Centre d'Écologie et des Sciences de la Conservation (CESCO), MNHN, CNRS, Sorbonne Université, Paris 75005, France
| |
Collapse
|
2
|
Gaudinier A, Rodriguez-Medina J, Zhang L, Olson A, Liseron-Monfils C, Bågman AM, Foret J, Abbitt S, Tang M, Li B, Runcie DE, Kliebenstein DJ, Shen B, Frank MJ, Ware D, Brady SM. Transcriptional regulation of nitrogen-associated metabolism and growth. Nature 2018; 563:259-264. [PMID: 30356219 DOI: 10.1038/s41586-018-0656-3] [Citation(s) in RCA: 192] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Accepted: 08/22/2018] [Indexed: 11/09/2022]
Abstract
Nitrogen is an essential macronutrient for plant growth and basic metabolic processes. The application of nitrogen-containing fertilizer increases yield, which has been a substantial factor in the green revolution1. Ecologically, however, excessive application of fertilizer has disastrous effects such as eutrophication2. A better understanding of how plants regulate nitrogen metabolism is critical to increase plant yield and reduce fertilizer overuse. Here we present a transcriptional regulatory network and twenty-one transcription factors that regulate the architecture of root and shoot systems in response to changes in nitrogen availability. Genetic perturbation of a subset of these transcription factors revealed coordinate transcriptional regulation of enzymes involved in nitrogen metabolism. Transcriptional regulators in the network are transcriptionally modified by feedback via genetic perturbation of nitrogen metabolism. The network, genes and gene-regulatory modules identified here will prove critical to increasing agricultural productivity.
Collapse
Affiliation(s)
- Allison Gaudinier
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA, USA
| | - Joel Rodriguez-Medina
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA, USA
| | - Lifang Zhang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, Cold Spring Harbor, NY, USA
| | - Andrew Olson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, Cold Spring Harbor, NY, USA
| | | | - Anne-Maarit Bågman
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA, USA
| | - Jessica Foret
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA, USA
| | | | - Michelle Tang
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA, USA.,Department of Plant Sciences, University of California, Davis, Davis, CA, USA
| | - Baohua Li
- Department of Plant Sciences, University of California, Davis, Davis, CA, USA
| | - Daniel E Runcie
- Department of Plant Sciences, University of California, Davis, Davis, CA, USA
| | - Daniel J Kliebenstein
- Department of Plant Sciences, University of California, Davis, Davis, CA, USA.,DynaMo Center of Excellence, University of Copenhagen, Frederiksberg C, Denmark
| | - Bo Shen
- DuPont Pioneer, Johnston, IA, USA
| | | | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, Cold Spring Harbor, NY, USA.,US Department of Agriculture, Agricultural Research Service, Ithaca, NY, USA
| | - Siobhan M Brady
- Department of Plant Biology and Genome Center, University of California, Davis, Davis, CA, USA.
| |
Collapse
|
3
|
Lemaire K, Thorrez L, Schuit F. Disallowed and Allowed Gene Expression: Two Faces of Mature Islet Beta Cells. Annu Rev Nutr 2016; 36:45-71. [DOI: 10.1146/annurev-nutr-071715-050808] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
| | - Lieven Thorrez
- Gene Expression Unit, Department of Cellular and Molecular Medicine, Faculty of Medicine, KU Leuven, Leuven B3000, Belgium; , ,
| | - Frans Schuit
- Gene Expression Unit, Department of Cellular and Molecular Medicine, Faculty of Medicine, KU Leuven, Leuven B3000, Belgium; , ,
| |
Collapse
|
4
|
Not Just a Sum? Identifying Different Types of Interplay between Constituents in Combined Interventions. PLoS One 2015; 10:e0125334. [PMID: 25965065 PMCID: PMC4429013 DOI: 10.1371/journal.pone.0125334] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 03/23/2015] [Indexed: 12/27/2022] Open
Abstract
Motivation Experiments in which the effect of combined manipulations is compared with the effects of their pure constituents have received a great deal of attention. Examples include the study of combination therapies and the comparison of double and single knockout model organisms. Often the effect of the combined manipulation is not a mere addition of the effects of its constituents, with quite different forms of interplay between the constituents being possible. Yet, a well-formalized taxonomy of possible forms of interplay is lacking, let alone a statistical methodology to test for their presence in empirical data. Results Starting from a taxonomy of a broad range of forms of interplay between constituents of a combined manipulation, we propose a sound statistical hypothesis testing framework to test for the presence of each particular form of interplay. We illustrate the framework with analyses of public gene expression data on the combined treatment of dendritic cells with curdlan and GM-CSF and show that these lead to valuable insights into the mode of action of the constituent treatments and their combination. Availability and Implementation R code implementing the statistical testing procedure for microarray gene expression data is available as supplementary material. The data are available from the Gene Expression Omnibus with accession number GSE32986.
Collapse
|
5
|
Combining evidence of preferential gene-tissue relationships from multiple sources. PLoS One 2013; 8:e70568. [PMID: 23950964 PMCID: PMC3741196 DOI: 10.1371/journal.pone.0070568] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Accepted: 06/21/2013] [Indexed: 11/19/2022] Open
Abstract
An important challenge in drug discovery and disease prognosis is to predict genes that are preferentially expressed in one or a few tissues, i.e. showing a considerably higher expression in one tissue(s) compared to the others. Although several data sources and methods have been published explicitly for this purpose, they often disagree and it is not evident how to retrieve these genes and how to distinguish true biological findings from those that are due to choice-of-method and/or experimental settings. In this work we have developed a computational approach that combines results from multiple methods and datasets with the aim to eliminate method/study-specific biases and to improve the predictability of preferentially expressed human genes. A rule-based score is used to merge and assign support to the results. Five sets of genes with known tissue specificity were used for parameter pruning and cross-validation. In total we identify 3434 tissue-specific genes. We compare the genes of highest scores with the public databases: PaGenBase (microarray), TiGER (EST) and HPA (protein expression data). The results have 85% overlap to PaGenBase, 71% to TiGER and only 28% to HPA. 99% of our predictions have support from at least one of these databases. Our approach also performs better than any of the databases on identifying drug targets and biomarkers with known tissue-specificity.
Collapse
|
6
|
Bandyopadhyay N, Somaiya M, Ranka S, Kahveci T. CMRF: analyzing differential gene regulation in two group perturbation experiments. BMC Genomics 2012; 13 Suppl 2:S2. [PMID: 22537297 PMCID: PMC3394417 DOI: 10.1186/1471-2164-13-s2-s2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microarray experiments often measure expressions of genes taken from sample tissues in the presence of external perturbations such as medication, radiation, or disease. The external perturbation can change the expressions of some genes directly or indirectly through gene interaction network. In this paper, we focus on an important class of such microarray experiments that inherently have two groups of tissue samples. When such different groups exist, the changes in expressions for some of the genes after the perturbation can be different between the two groups. It is not only important to identify the genes that respond differently across the two groups, but also to mine the reason behind this differential response. In this paper, we aim to identify the cause of this differential behavior of genes, whether because of the perturbation or due to interactions with other genes. RESULTS We propose a new probabilistic Bayesian method CMRF based on Markov Random Field to identify such genes. CMRF leverages the information about gene interactions as the prior of the model. We compare the accuracy of CMRF with SSEM and Student's t test and our old method SMRF on semi-synthetic dataset generated from microarray data. CMRF obtains high accuracy and outperforms all the other three methods. We also conduct a statistical significance test using a parametric noise based experiment to evaluate the accuracy of our method. In this experiment, CMRF generates significant regions of confidence for various parameter settings. CONCLUSIONS In this paper, we solved the problem of finding primarily differentially regulated genes in the presence of external perturbations when the data is sampled from two groups. The probabilistic Bayesian method CMRF based on Markov Random Field incorporates dependency structure of the gene networks as the prior to the model. Experimental results on synthetic and real datasets demonstrated the superiority of CMRF compared to other simple techniques.
Collapse
Affiliation(s)
- Nirmalya Bandyopadhyay
- Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32603, USA.
| | | | | | | |
Collapse
|
7
|
Hendrickx DM, Hoefsloot HC, Hendriks MM, Canelas AB, Smilde AK. Global test for metabolic pathway differences between conditions. Anal Chim Acta 2012; 719:8-15. [DOI: 10.1016/j.aca.2011.12.051] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2011] [Revised: 11/30/2011] [Accepted: 12/20/2011] [Indexed: 10/14/2022]
|
8
|
Van Deun K, Wilderjans TF, van den Berg RA, Antoniadis A, Van Mechelen I. A flexible framework for sparse simultaneous component based data integration. BMC Bioinformatics 2011; 12:448. [PMID: 22085701 PMCID: PMC3283562 DOI: 10.1186/1471-2105-12-448] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 11/15/2011] [Indexed: 12/05/2022] Open
Abstract
1 Background High throughput data are complex and methods that reveal structure underlying the data are most useful. Principal component analysis, frequently implemented as a singular value decomposition, is a popular technique in this respect. Nowadays often the challenge is to reveal structure in several sources of information (e.g., transcriptomics, proteomics) that are available for the same biological entities under study. Simultaneous component methods are most promising in this respect. However, the interpretation of the principal and simultaneous components is often daunting because contributions of each of the biomolecules (transcripts, proteins) have to be taken into account. 2 Results We propose a sparse simultaneous component method that makes many of the parameters redundant by shrinking them to zero. It includes principal component analysis, sparse principal component analysis, and ordinary simultaneous component analysis as special cases. Several penalties can be tuned that account in different ways for the block structure present in the integrated data. This yields known sparse approaches as the lasso, the ridge penalty, the elastic net, the group lasso, sparse group lasso, and elitist lasso. In addition, the algorithmic results can be easily transposed to the context of regression. Metabolomics data obtained with two measurement platforms for the same set of Escherichia coli samples are used to illustrate the proposed methodology and the properties of different penalties with respect to sparseness across and within data blocks. 3 Conclusion Sparse simultaneous component analysis is a useful method for data integration: First, simultaneous analyses of multiple blocks offer advantages over sequential and separate analyses and second, interpretation of the results is highly facilitated by their sparseness. The approach offered is flexible and allows to take the block structure in different ways into account. As such, structures can be found that are exclusively tied to one data platform (group lasso approach) as well as structures that involve all data platforms (Elitist lasso approach). 4 Availability The additional file contains a MATLAB implementation of the sparse simultaneous component method.
Collapse
Affiliation(s)
- Katrijn Van Deun
- Center for Computational Systems Biology SymBioSys, Katholieke Universiteit Leuven, 3000 Leuven, Belgium.
| | | | | | | | | |
Collapse
|
9
|
Thorrez L, Laudadio I, Van Deun K, Quintens R, Hendrickx N, Granvik M, Lemaire K, Schraenen A, Van Lommel L, Lehnert S, Aguayo-Mazzucato C, Cheng-Xue R, Gilon P, Van Mechelen I, Bonner-Weir S, Lemaigre F, Schuit F. Tissue-specific disallowance of housekeeping genes: the other face of cell differentiation. Genome Res 2010; 21:95-105. [PMID: 21088282 DOI: 10.1101/gr.109173.110] [Citation(s) in RCA: 144] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We report on a hitherto poorly characterized class of genes that are expressed in all tissues, except in one. Often, these genes have been classified as housekeeping genes, based on their nearly ubiquitous expression. However, the specific repression in one tissue defines a special class of "disallowed genes." In this paper, we used the intersection-union test to screen for such genes in a multi-tissue panel of genome-wide mRNA expression data. We propose that disallowed genes need to be repressed in the specific target tissue to ensure correct tissue function. We provide mechanistic data of repression with two metabolic examples, exercise-induced inappropriate insulin release and interference with ketogenesis in liver. Developmentally, this repression is established during tissue maturation in the early postnatal period involving epigenetic changes in histone methylation. In addition, tissue-specific expression of microRNAs can further diminish these repressed mRNAs. Together, we provide a systematic analysis of tissue-specific repression of housekeeping genes, a phenomenon that has not been studied so far on a genome-wide basis and, when perturbed, can lead to human disease.
Collapse
Affiliation(s)
- Lieven Thorrez
- Gene Expression Unit, Department of Molecular Cell Biology, Katholieke Universiteit Leuven, 3000 Leuven, Belgium
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|