1
|
Li R, Yi H, Ma S. A Selective Review of Network Analysis Methods for Gene Expression Data. Methods Mol Biol 2025; 2880:293-307. [PMID: 39900765 DOI: 10.1007/978-1-0716-4276-4_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2025]
Abstract
With the development of high-throughput profiling techniques, gene expressions have drawn significant attention due to their important biological implications, widespread data availability, and promising biological findings. The complex interactions and regulations among genes naturally lead to a network structure, which can provide a global view of molecular mechanisms and biological processes. This chapter provides a selective overview of constructing gene expression networks and utilizing them in downstream analysis. It also includes a demonstrating example.
Collapse
Affiliation(s)
- Rong Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Huangdi Yi
- Servier Pharmaceuticals, Boston, MA, USA
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.
| |
Collapse
|
2
|
Unger Avila P, Padvitski T, Leote AC, Chen H, Saez-Rodriguez J, Kann M, Beyer A. Gene regulatory networks in disease and ageing. Nat Rev Nephrol 2024; 20:616-633. [PMID: 38867109 DOI: 10.1038/s41581-024-00849-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2024] [Indexed: 06/14/2024]
Abstract
The precise control of gene expression is required for the maintenance of cellular homeostasis and proper cellular function, and the declining control of gene expression with age is considered a major contributor to age-associated changes in cellular physiology and disease. The coordination of gene expression can be represented through models of the molecular interactions that govern gene expression levels, so-called gene regulatory networks. Gene regulatory networks can represent interactions that occur through signal transduction, those that involve regulatory transcription factors, or statistical models of gene-gene relationships based on the premise that certain sets of genes tend to be coexpressed across a range of conditions and cell types. Advances in experimental and computational technologies have enabled the inference of these networks on an unprecedented scale and at unprecedented precision. Here, we delineate different types of gene regulatory networks and their cell-biological interpretation. We describe methods for inferring such networks from large-scale, multi-omics datasets and present applications that have aided our understanding of cellular ageing and disease mechanisms.
Collapse
Affiliation(s)
- Paula Unger Avila
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Tsimafei Padvitski
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Ana Carolina Leote
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - He Chen
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
- Department II of Internal Medicine, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Julio Saez-Rodriguez
- Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg University, Heidelberg, Germany
| | - Martin Kann
- Department II of Internal Medicine, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Center for Molecular Medicine Cologne, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Andreas Beyer
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany.
- Center for Molecular Medicine Cologne, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.
- Institute for Genetics, Faculty of Mathematics and Natural Sciences, University of Cologne, Cologne, Germany.
| |
Collapse
|
3
|
Chen S, Lin Z, Shen X, Li L, Pan W. Inference of causal metabolite networks in the presence of invalid instrumental variables with GWAS summary data. Genet Epidemiol 2023; 47:585-599. [PMID: 37573486 PMCID: PMC10840616 DOI: 10.1002/gepi.22535] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 06/19/2023] [Accepted: 08/01/2023] [Indexed: 08/14/2023]
Abstract
We propose structural equation models (SEMs) as a general framework to infer causal networks for metabolites and other complex traits. Traditionally SEMs are used only for individual-level data under the assumption that all instrumental variables (IVs) are valid. To overcome these limitations, we propose both one- and two-sample approaches for causal network inference based on SEMs that can: (1) perform causal analysis and discover causal relationships among multiple traits; (2) account for the possible presence of some invalid IVs; (3) allow for data analysis using only genome-wide association studies (GWAS) summary statistics when individual-level data are not available; (4) consider the possibility of bidirectional relationships between traits. Our method employs a simple stepwise selection to identify invalid IVs, thus avoiding false positives while possibly increasing true discoveries based on two-stage least squares (2SLS). We use both real GWAS data and simulated data to demonstrate the superior performance of our method over the standard 2SLS/SEMs. For real data analysis, our proposed approach is applied to a human blood metabolite GWAS summary data set to uncover putative causal relationships among the metabolites; we also identify some metabolites (putative) causal to Alzheimer's disease (AD), which, along with the inferred causal metabolite network, suggest some possible pathways of metabolites involved in AD.
Collapse
Affiliation(s)
- Siyi Chen
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455
| | - Zhaotong Lin
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455
| | - Ling Li
- Department of Experimental and Clinical Pharmacology, College of Pharmacy, University of Minnesota, Minneapolis, MN 55455
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455
| |
Collapse
|
4
|
Champion M, Chiquet J, Neuvial P, Elati M, Radvanyi F, Birmelé E. Identification of deregulation mechanisms specific to cancer subtypes. J Bioinform Comput Biol 2021; 19:2140003. [PMID: 33653235 DOI: 10.1142/s0219720021400035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In many cancers, mechanisms of gene regulation can be severely altered. Identification of deregulated genes, which do not follow the regulation processes that exist between transcription factors and their target genes, is of importance to better understand the development of the disease. We propose a methodology to detect deregulation mechanisms with a particular focus on cancer subtypes. This strategy is based on the comparison between tumoral and healthy cells. First, we use gene expression data from healthy cells to infer a reference gene regulatory network. Then, we compare it with gene expression levels in tumor samples to detect deregulated target genes. We finally measure the ability of each transcription factor to explain these deregulations. We apply our method on a public bladder cancer data set derived from The Cancer Genome Atlas project and confirm that it captures hallmarks of cancer subtypes. We also show that it enables the discovery of new potential biomarkers.
Collapse
Affiliation(s)
| | - Julien Chiquet
- Université Paris Saclay, AgroParisTech, INRAE, UMR MIA-Paris, Paris, France
| | - Pierre Neuvial
- Institut de Mathématiques de Toulouse, UMR 5219, Université de Toulouse, CNRS, France
| | - Mohamed Elati
- CANTHER, University of Lille, CNRS UMR 1277, Inserm U9020, 59045 Lille cedex, France
| | - François Radvanyi
- Institut Curie, PSL Research University, CNRS, UMR144, Paris, France
| | - Etienne Birmelé
- Université de Paris, CNRS, MAP5 UMR8145, Paris, France.,Institut de Recherche Mathématique Avancée, UMR 7501 Université de Strasbourg, CNRS, Strasbourg, France
| |
Collapse
|
5
|
Igolkina AA, Meshcheryakov G, Gretsova MV, Nuzhdin SV, Samsonova MG. Multi-trait multi-locus SEM model discriminates SNPs of different effects. BMC Genomics 2020; 21:490. [PMID: 32723302 PMCID: PMC7385891 DOI: 10.1186/s12864-020-06833-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 06/16/2020] [Indexed: 11/21/2022] Open
Abstract
Background There is a plethora of methods for genome-wide association studies. However, only a few of them may be classified as multi-trait and multi-locus, i.e. consider the influence of multiple genetic variants to several correlated phenotypes. Results We propose a multi-trait multi-locus model which employs structural equation modeling (SEM) to describe complex associations between SNPs and traits - multi-trait multi-locus SEM (mtmlSEM). The structure of our model makes it possible to discriminate pleiotropic and single-trait SNPs of direct and indirect effect. We also propose an automatic procedure to construct the model using factor analysis and the maximum likelihood method. For estimating a large number of parameters in the model, we performed Bayesian inference and implemented Gibbs sampling. An important feature of the model is that it correctly copes with non-normally distributed variables, such as some traits and variants. Conclusions We applied the model to Vavilov’s collection of 404 chickpea (Cicer arietinum L.) accessions with 20-fold cross-validation. We analyzed 16 phenotypic traits which we organized into five groups and found around 230 SNPs associated with traits, 60 of which were of pleiotropic effect. The model demonstrated high accuracy in predicting trait values.
Collapse
|
6
|
Everett LJ, Huang W, Zhou S, Carbone MA, Lyman RF, Arya GH, Geisz MS, Ma J, Morgante F, St Armour G, Turlapati L, Anholt RRH, Mackay TFC. Gene expression networks in the Drosophila Genetic Reference Panel. Genome Res 2020; 30:485-496. [PMID: 32144088 PMCID: PMC7111517 DOI: 10.1101/gr.257592.119] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Accepted: 02/28/2020] [Indexed: 01/02/2023]
Abstract
A major challenge in modern biology is to understand how naturally occurring variation in DNA sequences affects complex organismal traits through networks of intermediate molecular phenotypes. This question is best addressed in a genetic mapping population in which all molecular polymorphisms are known and for which molecular endophenotypes and complex traits are assessed on the same genotypes. Here, we performed deep RNA sequencing of 200 Drosophila Genetic Reference Panel inbred lines with complete genome sequences and for which phenotypes of many quantitative traits have been evaluated. We mapped expression quantitative trait loci for annotated genes, novel transcribed regions, transposable elements, and microbial species. We identified host variants that affect expression of transposable elements, independent of their copy number, as well as microbiome composition. We constructed sex-specific expression quantitative trait locus regulatory networks. These networks are enriched for novel transcribed regions and target genes in heterochromatin and euchromatic regions of reduced recombination, as well as genes regulating transposable element expression. This study provides new insights regarding the role of natural genetic variation in regulating gene expression and generates testable hypotheses for future functional analyses.
Collapse
Affiliation(s)
- Logan J Everett
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695-7614, USA
| | - Wen Huang
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695-7614, USA
| | - Shanshan Zhou
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695-7614, USA
| | - Mary Anna Carbone
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695-7614, USA
| | - Richard F Lyman
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695-7614, USA
| | - Gunjan H Arya
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695-7614, USA
| | - Matthew S Geisz
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695-7614, USA
- University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, North Carolina 27516, USA
| | - Junwu Ma
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, JiangXi Agricultural University, JiangXi, China
| | - Fabio Morgante
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695-7614, USA
| | - Genevieve St Armour
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695-7614, USA
| | - Lavanya Turlapati
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695-7614, USA
| | - Robert R H Anholt
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695-7614, USA
| | - Trudy F C Mackay
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695-7614, USA
| |
Collapse
|
7
|
Li Y, Liu D, Li T, Zhu Y. Bayesian differential analysis of gene regulatory networks exploiting genetic perturbations. BMC Bioinformatics 2020; 21:12. [PMID: 31918656 PMCID: PMC6953167 DOI: 10.1186/s12859-019-3314-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 12/12/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene regulatory networks (GRNs) can be inferred from both gene expression data and genetic perturbations. Under different conditions, the gene data of the same gene set may be different from each other, which results in different GRNs. Detecting structural difference between GRNs under different conditions is of great significance for understanding gene functions and biological mechanisms. RESULTS In this paper, we propose a Bayesian Fused algorithm to jointly infer differential structures of GRNs under two different conditions. The algorithm is developed for GRNs modeled with structural equation models (SEMs), which makes it possible to incorporate genetic perturbations into models to improve the inference accuracy, so we name it BFDSEM. Different from the naive approaches that separately infer pair-wise GRNs and identify the difference from the inferred GRNs, we first re-parameterize the two SEMs to form an integrated model that takes full advantage of the two groups of gene data, and then solve the re-parameterized model by developing a novel Bayesian fused prior following the criterion that separate GRNs and differential GRN are both sparse. CONCLUSIONS Computer simulations are run on synthetic data to compare BFDSEM to two state-of-the-art joint inference algorithms: FSSEM and ReDNet. The results demonstrate that the performance of BFDSEM is comparable to FSSEM, and is generally better than ReDNet. The BFDSEM algorithm is also applied to a real data set of lung cancer and adjacent normal tissues, the yielded normal GRN and differential GRN are consistent with the reported results in previous literatures. An open-source program implementing BFDSEM is freely available in Additional file 1.
Collapse
Affiliation(s)
- Yan Li
- College of Computer Science and Technology, Jilin University, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| | - Dayou Liu
- College of Computer Science and Technology, Jilin University, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| | - Tengfei Li
- College of Computer Science and Technology, Jilin University, Changchun, 130012 China
| | - Yungang Zhu
- College of Computer Science and Technology, Jilin University, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| |
Collapse
|
8
|
Zhou X, Cai X. Inference of differential gene regulatory networks based on gene expression and genetic perturbation data. Bioinformatics 2020; 36:197-204. [PMID: 31263873 PMCID: PMC6956787 DOI: 10.1093/bioinformatics/btz529] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 06/09/2019] [Accepted: 06/28/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) of the same organism can be different under different conditions, although the overall network structure may be similar. Understanding the difference in GRNs under different conditions is important to understand condition-specific gene regulation. When gene expression and other relevant data under two different conditions are available, they can be used by an existing network inference algorithm to estimate two GRNs separately, and then to identify the difference between the two GRNs. However, such an approach does not exploit the similarity in two GRNs, and may sacrifice inference accuracy. RESULTS In this paper, we model GRNs with the structural equation model (SEM) that can integrate gene expression and genetic perturbation data, and develop an algorithm named fused sparse SEM (FSSEM), to jointly infer GRNs under two conditions, and then to identify difference of the two GRNs. Computer simulations demonstrate that the FSSEM algorithm outperforms the approaches that estimate two GRNs separately. Analysis of a dataset of lung cancer and another dataset of gastric cancer with FSSEM inferred differential GRNs in cancer versus normal tissues, whose genes with largest network degrees have been reported to be implicated in tumorigenesis. The FSSEM algorithm provides a valuable tool for joint inference of two GRNs and identification of the differential GRN under two conditions. AVAILABILITY AND IMPLEMENTATION The R package fssemR implementing the FSSEM algorithm is available at https://github.com/Ivis4ml/fssemR.git. It is also available on CRAN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xin Zhou
- Department of Electrical and Computer Engineering, University of Miami, FL 33146, USA
| | - Xiaodong Cai
- Department of Electrical and Computer Engineering, University of Miami, FL 33146, USA
| |
Collapse
|
9
|
Momen M, Campbell MT, Walia H, Morota G. Utilizing trait networks and structural equation models as tools to interpret multi-trait genome-wide association studies. PLANT METHODS 2019; 15:107. [PMID: 31548847 PMCID: PMC6749677 DOI: 10.1186/s13007-019-0493-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 09/06/2019] [Indexed: 05/13/2023]
Abstract
BACKGROUND Plant breeders seek to develop cultivars with maximal agronomic value, which is often assessed using numerous, often genetically correlated traits. As intervention on one trait will affect the value of another, breeding decisions should consider the relationships among traits in the context of putative causal structures (i.e., trait networks). While multi-trait genome-wide association studies (MTM-GWAS) can infer putative genetic signals at the multivariate scale, standard MTM-GWAS does not accommodate the network structure of phenotypes, and therefore does not address how the traits are interrelated. We extended the scope of MTM-GWAS by incorporating trait network structures into GWAS using structural equation models (SEM-GWAS). Here, we illustrate the utility of SEM-GWAS using a digital metric for shoot biomass, root biomass, water use, and water use efficiency in rice. RESULTS A salient feature of SEM-GWAS is that it can partition the total single nucleotide polymorphism (SNP) effects acting on a trait into direct and indirect effects. Using this novel approach, we show that for most QTL associated with water use, total SNP effects were driven by genetic effects acting directly on water use rather that genetic effects originating from upstream traits. Conversely, total SNP effects for water use efficiency were largely due to indirect effects originating from the upstream trait, projected shoot area. CONCLUSIONS We describe a robust framework that can be applied to multivariate phenotypes to understand the interrelationships between complex traits. This framework provides novel insights into how QTL act within a phenotypic network that would otherwise not be possible with conventional multi-trait GWAS approaches. Collectively, these results suggest that the use of SEM may enhance our understanding of complex relationships among agronomic traits.
Collapse
Affiliation(s)
- Mehdi Momen
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, 175 West Campus Drive, Blacksburg, VA 24061 USA
| | - Malachy T. Campbell
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, 175 West Campus Drive, Blacksburg, VA 24061 USA
| | - Harkamal Walia
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583 USA
| | - Gota Morota
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, 175 West Campus Drive, Blacksburg, VA 24061 USA
| |
Collapse
|
10
|
Yu H, Blair RH. Integration of probabilistic regulatory networks into constraint-based models of metabolism with applications to Alzheimer's disease. BMC Bioinformatics 2019; 20:386. [PMID: 31291905 PMCID: PMC6617954 DOI: 10.1186/s12859-019-2872-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 05/02/2019] [Indexed: 01/08/2023] Open
Abstract
Background Mathematical models of biological networks can provide important predictions and insights into complex disease. Constraint-based models of cellular metabolism and probabilistic models of gene regulatory networks are two distinct areas that have progressed rapidly in parallel over the past decade. In principle, gene regulatory networks and metabolic networks underly the same complex phenotypes and diseases. However, systematic integration of these two model systems remains a fundamental challenge. Results In this work, we address this challenge by fusing probabilistic models of gene regulatory networks into constraint-based models of metabolism. The novel approach utilizes probabilistic reasoning in BN models of regulatory networks serves as the “glue” that enables a natural interface between the two systems. Probabilistic reasoning is used to predict and quantify system-wide effects of perturbation to the regulatory network in the form of constraints for flux variability analysis. In this setting, both regulatory and metabolic networks inherently account for uncertainty. Applications leverage constraint-based metabolic models of brain metabolism and gene regulatory networks parameterized by gene expression data from the hippocampus to investigate the role of the HIF-1 pathway in Alzheimer’s disease. Integrated models support HIF-1A as effective target to reduce the effects of hypoxia in Alzheimer’s disease. However, HIF-1A activation is far less effective in shifting metabolism when compared to brain metabolism in healthy controls. Conclusions The direct integration of probabilistic regulatory networks into constraint-based models of metabolism provides novel insights into how perturbations in the regulatory network may influence metabolic states. Predictive modeling of enzymatic activity can be facilitated using probabilistic reasoning, thereby extending the predictive capacity of the network. This framework for model integration is generalizable to other systems. Electronic supplementary material The online version of this article (10.1186/s12859-019-2872-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Han Yu
- State University of New York at Buffalo, 3435 Main Street, Buffalo, 14214, US
| | | |
Collapse
|
11
|
Inferring Gene Regulatory Networks from a Population of Yeast Segregants. Sci Rep 2019; 9:1197. [PMID: 30718595 PMCID: PMC6361976 DOI: 10.1038/s41598-018-37667-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 11/30/2018] [Indexed: 12/14/2022] Open
Abstract
Constructing gene regulatory networks is crucial to unraveling the genetic architecture of complex traits and to understanding the mechanisms of diseases. On the basis of gene expression and single nucleotide polymorphism data in the yeast, Saccharomyces cerevisiae, we constructed gene regulatory networks using a two-stage penalized least squares method. A large system of structural equations via optimal prediction of a set of surrogate variables was established at the first stage, followed by consistent selection of regulatory effects at the second stage. Using this approach, we identified subnetworks that were enriched in gene ontology categories, revealing directional regulatory mechanisms controlling these biological pathways. Our mapping and analysis of expression-based quantitative trait loci uncovered a known alteration of gene expression within a biological pathway that results in regulatory effects on companion pathway genes in the phosphocholine network. In addition, we identify nodes in these gene ontology-enriched subnetworks that are coordinately controlled by transcription factors driven by trans-acting expression quantitative trait loci. Altogether, the integration of documented transcription factor regulatory associations with subnetworks defined by a system of structural equations using quantitative trait loci data is an effective means to delineate the transcriptional control of biological pathways.
Collapse
|
12
|
Causal Queries from Observational Data in Biological Systems via Bayesian Networks: An Empirical Study in Small Networks. Methods Mol Biol 2018. [PMID: 30547398 DOI: 10.1007/978-1-4939-8882-2_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Biological networks are a very convenient modeling and visualization tool to discover knowledge from modern high-throughput genomics and post-genomics data sets. Indeed, biological entities are not isolated but are components of complex multilevel systems. We go one step further and advocate for the consideration of causal representations of the interactions in living systems. We present the causal formalism and bring it out in the context of biological networks, when the data is observational. We also discuss its ability to decipher the causal information flow as observed in gene expression. We also illustrate our exploration by experiments on small simulated networks as well as on a real biological data set.
Collapse
|
13
|
Romero-Ibarguengoitia ME, Vadillo-Ortega F, Caballero AE, Ibarra-González I, Herrera-Rosas A, Serratos-Canales MF, León-Hernández M, González-Chávez A, Mummidi S, Duggirala R, López-Alvarenga JC. Family history and obesity in youth, their effect on acylcarnitine/aminoacids metabolomics and non-alcoholic fatty liver disease (NAFLD). Structural equation modeling approach. PLoS One 2018; 13:e0193138. [PMID: 29466466 PMCID: PMC5821462 DOI: 10.1371/journal.pone.0193138] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 02/05/2018] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Structural equation modeling (SEM) can help understanding complex functional relationships among obesity, non-alcoholic fatty liver disease (NAFLD), family history of obesity, targeted metabolomics and pro-inflammatory markers. We tested two hypotheses: 1) If obesity precedes an excess of free fatty acids that increase oxidative stress and mitochondrial dysfunction, there would be an increase of serum acylcarnitines, amino acids and cytokines in obese subjects. Acylcarnitines would be related to non-alcoholic fatty disease that will induce insulin resistance. 2) If a positive family history of obesity and type 2 diabetes are the major determinants of the metabolomic profile, there would be higher concentration of amino acids and acylcarnitines in patients with this background that will induce obesity and NAFLD which in turn will induce insulin resistance. METHODS/RESULTS 137 normoglycemic subjects, mean age (SD) of 30.61 (8.6) years divided in three groups: BMI<25 with absence of NAFLD (G1), n = 82; BMI>30 with absence of NAFLD (G2), n = 24; and BMI>30 with NAFLD (G3), n = 31. Family history of obesity (any) was present in 53%. Both models were adjusted in SEM. Family history of obesity predicted obesity but could not predict acylcarnitines and amino acid concentrations (effect size <0.2), but did predict obesity phenotype. CONCLUSION Family history of obesity is the major predictor of obesity, and the metabolic abnormalities on amino acids, acylcarnitines, inflammation, insulin resistance, and NAFLD.
Collapse
Affiliation(s)
| | - Felipe Vadillo-Ortega
- Vinculation Unit Faculty of Medicine UNAM, Instituto Nacional de Medicina Genomica (INMEGEN), Mexico City, Mexico
| | | | | | | | | | | | | | - Srinivas Mummidi
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Edinburg, TX, United States of America
| | - Ravindranath Duggirala
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Edinburg, TX, United States of America
| | - Juan Carlos López-Alvarenga
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Edinburg, TX, United States of America
- Research department, Universidad Mexico Americana del Norte, Reynosa, Tamaulipas, Mexico
| |
Collapse
|
14
|
Abstract
Studies have pointed out that the expression of genes are highly regulated, which result in a cascade of distinct patterns of coexpression forming a network. Identifying and understanding such patterns is crucial in deciphering molecular mechanisms that underlie the pathophysiology of diseases. With the advance of high throughput assay of messenger RNA (mRNA) and high performance computing, reconstructing such network from molecular data such as gene expression is now possible. This chapter discusses an overview of methods of constructing such networks, practical considerations, and an example.
Collapse
Affiliation(s)
- Roby Joehanes
- Hebrew SeniorLife, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
15
|
Lepik K, Annilo T, Kukuškina V, eQTLGen Consortium, Kisand K, Kutalik Z, Peterson P, Peterson H. C-reactive protein upregulates the whole blood expression of CD59 - an integrative analysis. PLoS Comput Biol 2017; 13:e1005766. [PMID: 28922377 PMCID: PMC5609773 DOI: 10.1371/journal.pcbi.1005766] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Revised: 09/22/2017] [Accepted: 09/01/2017] [Indexed: 12/21/2022] Open
Abstract
Elevated C-reactive protein (CRP) concentrations in the blood are associated with acute and chronic infections and inflammation. Nevertheless, the functional role of increased CRP in multiple bacterial and viral infections as well as in chronic inflammatory diseases remains unclear. Here, we studied the relationship between CRP and gene expression levels in the blood in 491 individuals from the Estonian Biobank cohort, to elucidate the role of CRP in these inflammatory mechanisms. As a result, we identified a set of 1,614 genes associated with changes in CRP levels with a high proportion of interferon-stimulated genes. Further, we performed likelihood-based causality model selection and Mendelian randomization analysis to discover causal links between CRP and the expression of CRP-associated genes. Strikingly, our computational analysis and cell culture stimulation assays revealed increased CRP levels to drive the expression of complement regulatory protein CD59, suggesting CRP to have a critical role in protecting blood cells from the adverse effects of the immune defence system. Our results show the benefit of integrative analysis approaches in hypothesis-free uncovering of causal relationships between traits. Chronic inflammation is associated with chronic diseases, morbidity and mortality while lower base inflammation levels are thought to be predictive of healthy aging. Thus, to pursue a long and healthy lifespan, it is essential to understand the inflammatory regulatory mechanisms. To that end, we studied the functional role of C-reactive protein (CRP)–an inflammatory biomarker that is used to measure cardiovascular risk in clinical practice. There is evidence for a strong genetic component of elevated CRP levels but it is still unclear if it has a direct impact on the processes that lead to inflammatory diseases. In order to elucidate the function of CRP in the blood, we used statistical methods for causal inference to infer causal relationships between changes in CRP and gene expression levels. Our statistical analysis and cell culture experiments suggest that CRP drives the expression of complement regulatory protein CD59. Thus, CRP can have a functional role in protecting human blood cells from the adverse effects of the immune defence system.
Collapse
Affiliation(s)
- Kaido Lepik
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Institute of Social and Preventive Medicine, Lausanne University Hospital, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail:
| | - Tarmo Annilo
- Estonian Genome Center, University of Tartu, Tartu, Estonia
| | | | | | - Kai Kisand
- Molecular Pathology, Institute of Biomedical and Translational Medicine, University of Tartu, Tartu, Estonia
| | - Zoltán Kutalik
- Institute of Social and Preventive Medicine, Lausanne University Hospital, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pärt Peterson
- Molecular Pathology, Institute of Biomedical and Translational Medicine, University of Tartu, Tartu, Estonia
| | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Quretec Ltd, Tartu, Estonia
| |
Collapse
|
16
|
De Souza Jacomini R, Martins DC, Da Silva FL, Costa AHR. GeNICE: A Novel Framework for Gene Network Inference by Clustering, Exhaustive Search, and Multivariate Analysis. J Comput Biol 2017. [PMID: 28636461 DOI: 10.1089/cmb.2017.0022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Gene network (GN) inference from temporal gene expression data is a crucial and challenging problem in systems biology. Expression data sets usually consist of dozens of temporal samples, while networks consist of thousands of genes, thus rendering many inference methods unfeasible in practice. To improve the scalability of GN inference methods, we propose a novel framework called GeNICE, based on probabilistic GNs; the main novelty is the introduction of a clustering procedure to group genes with related expression profiles and to provide an approximate solution with reduced computational complexity. We use the defined clusters to perform an exhaustive search to retrieve the best predictor gene subsets for each target gene, according to multivariate criterion functions. GeNICE greatly reduces the search space because predictor candidates are restricted to one gene per cluster. Finally, a multivariate analysis is performed for each defined predictor subset to retrieve minimal subsets and to simplify the network. In our experiments with in silico generated data sets, GeNICE achieved substantial computational time reduction when compared to solutions without the clustering step, while preserving the gene expression prediction accuracy even when the number of clusters is small (about 50) relative to the number of genes (order of thousands). For a Plasmodium falciparum microarray data set, the prediction accuracy achieved by GeNICE was roughly 97%, while the respective topologies involving glycolytic and apicoplast seed genes had a very large intramodularity, very small interconnection between modules, and some module hub genes, reflecting small-world and scale-free topological properties, as expected.
Collapse
|
17
|
Tarka P. An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. QUALITY & QUANTITY 2017; 52:313-354. [PMID: 29416184 PMCID: PMC5794813 DOI: 10.1007/s11135-017-0469-8] [Citation(s) in RCA: 143] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
This paper is a tribute to researchers who have significantly contributed to improving and advancing structural equation modeling (SEM). It is, therefore, a brief overview of SEM and presents its beginnings, historical development, its usefulness in the social sciences and the statistical and philosophical (theoretical) controversies which have often appeared in the literature pertaining to SEM. Having described the essence of SEM in the context of causal analysis, the author discusses the years of the development of structural modeling as the consequence of many researchers' systematically growing needs (in particular in the social sciences) who strove to effectively understand the structure and interactions of latent phenomena. The early beginnings of SEM models were related to the work of Spearman and Wright, and to that of other prominent researchers who contributed to SEM development. The importance and predominance of theoretical assumptions over technical issues for the successful construction of SEM models are also described. Then, controversies regarding the use of SEM in the social sciences are presented. Finally, the opportunities and threats of this type of analytical strategy as well as selected areas of SEM applications in the social sciences are discussed.
Collapse
Affiliation(s)
- Piotr Tarka
- Department of Market Research, Poznan University of Economics, al. Niepodleglosci 10, 61-875 Poznan, Poland
| |
Collapse
|
18
|
Wang X, Alshawaqfeh M, Dang X, Wajid B, Noor A, Qaraqe M, Serpedin E. An Overview of NCA-Based Algorithms for Transcriptional Regulatory Network Inference. ACTA ACUST UNITED AC 2015; 4:596-617. [PMID: 27600242 PMCID: PMC4996402 DOI: 10.3390/microarrays4040596] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 10/07/2015] [Accepted: 11/11/2015] [Indexed: 01/08/2023]
Abstract
In systems biology, the regulation of gene expressions involves a complex network of regulators. Transcription factors (TFs) represent an important component of this network: they are proteins that control which genes are turned on or off in the genome by binding to specific DNA sequences. Transcription regulatory networks (TRNs) describe gene expressions as a function of regulatory inputs specified by interactions between proteins and DNA. A complete understanding of TRNs helps to predict a variety of biological processes and to diagnose, characterize and eventually develop more efficient therapies. Recent advances in biological high-throughput technologies, such as DNA microarray data and next-generation sequence (NGS) data, have made the inference of transcription factor activities (TFAs) and TF-gene regulations possible. Network component analysis (NCA) represents an efficient computational framework for TRN inference from the information provided by microarrays, ChIP-on-chip and the prior information about TF-gene regulation. However, NCA suffers from several shortcomings. Recently, several algorithms based on the NCA framework have been proposed to overcome these shortcomings. This paper first overviews the computational principles behind NCA, and then, it surveys the state-of-the-art NCA-based algorithms proposed in the literature for TRN reconstruction.
Collapse
Affiliation(s)
- Xu Wang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Mustafa Alshawaqfeh
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Xuan Dang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Bilal Wajid
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Amina Noor
- Institute of Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA.
| | - Marwa Qaraqe
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Erchin Serpedin
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
19
|
Peñagaricano F, Valente BD, Steibel JP, Bates RO, Ernst CW, Khatib H, Rosa GJM. Exploring causal networks underlying fat deposition and muscularity in pigs through the integration of phenotypic, genotypic and transcriptomic data. BMC SYSTEMS BIOLOGY 2015; 9:58. [PMID: 26376630 PMCID: PMC4574162 DOI: 10.1186/s12918-015-0207-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 09/04/2015] [Indexed: 12/23/2022]
Abstract
BACKGROUND Joint modeling and analysis of phenotypic, genotypic and transcriptomic data have the potential to uncover the genetic control of gene activity and phenotypic variation, as well as shed light on the manner and extent of connectedness among these variables. Current studies mainly report associations, i.e. undirected connections among variables without causal interpretation. Knowledge regarding causal relationships among genes and phenotypes can be used to predict the behavior of complex systems, as well as to optimize management practices and selection strategies. Here, we performed a multistep procedure for inferring causal networks underlying carcass fat deposition and muscularity in pigs using multi-omics data obtained from an F2 Duroc x Pietrain resource pig population. RESULTS We initially explored marginal associations between genotypes and phenotypic and expression traits through whole-genome scans, and then, in genomic regions with multiple significant hits, we assessed gene-phenotype network reconstruction using causal structural learning algorithms. One genomic region on SSC6 showed significant associations with three relevant phenotypes, off-midline10th-rib backfat thickness, loin muscle weight, and average intramuscular fat percentage, and also with the expression of seven genes, including ZNF24, SSX2IP, and AKR7A2. The inferred network indicated that the genotype affects the three phenotypes mainly through the expression of several genes. Among the phenotypes, fat deposition traits negatively affected loin muscle weight. CONCLUSIONS Our findings shed light on the antagonist relationship between carcass fat deposition and lean meat content in pigs. In addition, the procedure described in this study has the potential to unravel gene-phenotype networks underlying complex phenotypes.
Collapse
Affiliation(s)
- Francisco Peñagaricano
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Present Address: Department of Animal Sciences, and University of Florida Genetics Institute, University of Florida, Gainesville, FL, 326111, USA.
| | - Bruno D Valente
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Dairy Science, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA.
| | - Ronald O Bates
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA.
| | - Catherine W Ernst
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA.
| | - Hasan Khatib
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| | - Guilherme J M Rosa
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| |
Collapse
|
20
|
Fear JM, Arbeitman MN, Salomon MP, Dalton JE, Tower J, Nuzhdin SV, McIntyre LM. The Wright stuff: reimagining path analysis reveals novel components of the sex determination hierarchy in Drosophila melanogaster. BMC SYSTEMS BIOLOGY 2015; 9:53. [PMID: 26335107 PMCID: PMC4558766 DOI: 10.1186/s12918-015-0200-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 08/20/2015] [Indexed: 11/10/2022]
Abstract
BACKGROUND The Drosophila sex determination hierarchy is a classic example of a transcriptional regulatory hierarchy, with sex-specific isoforms regulating morphology and behavior. We use a structural equation modeling approach, leveraging natural genetic variation from two studies on Drosophila female head tissues--DSPR collection (596 F1-hybrids from crosses between DSPR sub-populations) and CEGS population (75 F1-hybrids from crosses between DGRP/Winters lines to a reference strain w1118)--to expand understanding of the sex hierarchy gene regulatory network (GRN). This approach is completely generalizable to any natural population, including humans. RESULTS We expanded the sex hierarchy GRN adding novel links among genes, including a link from fruitless (fru) to Sex-lethal (Sxl) identified in both populations. This link is further supported by the presence of fru binding sites in the Sxl locus. 754 candidate genes were added to the pathway, including the splicing factors male-specific lethal 2 and Rm62 as downstream targets of Sxl which are well-supported links in males. Independent studies of doublesex and transformer mutants support many additions, including evidence for a link between the sex hierarchy and metabolism, via Insulin-like receptor. CONCLUSIONS The genes added in the CEGS population were enriched for genes with sex-biased splicing and components of the spliceosome. A common goal of molecular biologists is to expand understanding about regulatory interactions among genes. Using natural alleles we can not only identify novel relationships, but using supervised approaches can order genes into a regulatory hierarchy. Combining these results with independent large effect mutation studies, allows clear candidates for detailed molecular follow-up to emerge.
Collapse
Affiliation(s)
- Justin M Fear
- Department of Molecular Genetics and Microbiology, University of Florida, CGRC Room 116, PO Box 100266, FL 32610-0266, Gainesville, FL, USA.
| | | | - Matthew P Salomon
- Molecular and Computational Biology, University of California, Los Angeles, CA, USA.
| | - Justin E Dalton
- Biomedical Science, Florida State University, Tallahassee, FL, USA.
| | - John Tower
- Molecular and Computational Biology, University of California, Los Angeles, CA, USA.
| | - Sergey V Nuzhdin
- Molecular and Computational Biology, University of California, Los Angeles, CA, USA.
| | - Lauren M McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, CGRC Room 116, PO Box 100266, FL 32610-0266, Gainesville, FL, USA.
| |
Collapse
|
21
|
Zhang W, Zhou T. A Sparse Reconstruction Approach for Identifying Gene Regulatory Networks Using Steady-State Experiment Data. PLoS One 2015. [PMID: 26207991 PMCID: PMC4514654 DOI: 10.1371/journal.pone.0130979] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Motivation Identifying gene regulatory networks (GRNs) which consist of a large number of interacting units has become a problem of paramount importance in systems biology. Situations exist extensively in which causal interacting relationships among these units are required to be reconstructed from measured expression data and other a priori information. Though numerous classical methods have been developed to unravel the interactions of GRNs, these methods either have higher computing complexities or have lower estimation accuracies. Note that great similarities exist between identification of genes that directly regulate a specific gene and a sparse vector reconstruction, which often relates to the determination of the number, location and magnitude of nonzero entries of an unknown vector by solving an underdetermined system of linear equations y = Φx. Based on these similarities, we propose a novel framework of sparse reconstruction to identify the structure of a GRN, so as to increase accuracy of causal regulation estimations, as well as to reduce their computational complexity. Results In this paper, a sparse reconstruction framework is proposed on basis of steady-state experiment data to identify GRN structure. Different from traditional methods, this approach is adopted which is well suitable for a large-scale underdetermined problem in inferring a sparse vector. We investigate how to combine the noisy steady-state experiment data and a sparse reconstruction algorithm to identify causal relationships. Efficiency of this method is tested by an artificial linear network, a mitogen-activated protein kinase (MAPK) pathway network and the in silico networks of the DREAM challenges. The performance of the suggested approach is compared with two state-of-the-art algorithms, the widely adopted total least-squares (TLS) method and those available results on the DREAM project. Actual results show that, with a lower computational cost, the proposed method can significantly enhance estimation accuracy and greatly reduce false positive and negative errors. Furthermore, numerical calculations demonstrate that the proposed algorithm may have faster convergence speed and smaller fluctuation than other methods when either estimate error or estimate bias is considered.
Collapse
Affiliation(s)
- Wanhong Zhang
- School of Chemical Machinery, Qinghai University, Qinghai, China
- Department of Automation, Tsinghua University, Beijing, China
- * E-mail:
| | - Tong Zhou
- School of Chemical Machinery, Qinghai University, Qinghai, China
- Tsinghua National Laboratory for Information Science and Technology(TNList), Tsinghua University, Beijing, China
| |
Collapse
|
22
|
Park WB, Singh SP, Kim M, Sohn KS. Phosphor informatics based on confirmatory factor analysis. ACS COMBINATORIAL SCIENCE 2015; 17:317-25. [PMID: 25853926 DOI: 10.1021/acscombsci.5b00017] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The theoretical understanding of phosphor luminescence is far from complete. To accomplish a full understanding of phosphor luminescence, the data mining of existing experimental data should receive equal consideration along with theoretical approaches. We mined the crystallographic and luminescence data of 75 reported Eu(2+)-doped phosphors with a single Wyckoff site for Eu(2+) activator accommodation, and 32 descriptors were extracted. A confirmatory factor analysis (CFA) based on a structural equation model (SEM) was employed since it has been helpful in understanding complex problems in social sciences and in bioinformatics. This first attempt at applying CFA to the data mining of engineering materials provided a better understanding of the structural and luminescent-property relationships for LED phosphors than what we have learnt so far from the conventional theoretical approaches.
Collapse
Affiliation(s)
- Woon Bae Park
- Faculty of Nanotechnology and Advanced Materials Engineering, Sejong University, Seoul 143-747, Korea
| | - Satendra Pal Singh
- Faculty of Nanotechnology and Advanced Materials Engineering, Sejong University, Seoul 143-747, Korea
| | - Minseuk Kim
- Faculty of Nanotechnology and Advanced Materials Engineering, Sejong University, Seoul 143-747, Korea
| | - Kee-Sun Sohn
- Faculty of Nanotechnology and Advanced Materials Engineering, Sejong University, Seoul 143-747, Korea
| |
Collapse
|
23
|
Bayesian network reconstruction using systems genetics data: comparison of MCMC methods. Genetics 2015; 199:973-89. [PMID: 25631319 PMCID: PMC4391572 DOI: 10.1534/genetics.114.172619] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2014] [Accepted: 01/26/2015] [Indexed: 12/23/2022] Open
Abstract
Reconstructing biological networks using high-throughput technologies has the potential to produce condition-specific interactomes. But are these reconstructed networks a reliable source of biological interactions? Do some network inference methods offer dramatically improved performance on certain types of networks? To facilitate the use of network inference methods in systems biology, we report a large-scale simulation study comparing the ability of Markov chain Monte Carlo (MCMC) samplers to reverse engineer Bayesian networks. The MCMC samplers we investigated included foundational and state-of-the-art Metropolis-Hastings and Gibbs sampling approaches, as well as novel samplers we have designed. To enable a comprehensive comparison, we simulated gene expression and genetics data from known network structures under a range of biologically plausible scenarios. We examine the overall quality of network inference via different methods, as well as how their performance is affected by network characteristics. Our simulations reveal that network size, edge density, and strength of gene-to-gene signaling are major parameters that differentiate the performance of various samplers. Specifically, more recent samplers including our novel methods outperform traditional samplers for highly interconnected large networks with strong gene-to-gene signaling. Our newly developed samplers show comparable or superior performance to the top existing methods. Moreover, this performance gain is strongest in networks with biologically oriented topology, which indicates that our novel samplers are suitable for inferring biological networks. The performance of MCMC samplers in this simulation framework can guide the choice of methods for network reconstruction using systems genetics data.
Collapse
|
24
|
Abstract
Expression quantitative trait loci (eQTL) mapping constitutes a challenging problem due to, among other reasons, the high-dimensional multivariate nature of gene-expression traits. Next to the expression heterogeneity produced by confounding factors and other sources of unwanted variation, indirect effects spread throughout genes as a result of genetic, molecular, and environmental perturbations. From a multivariate perspective one would like to adjust for the effect of all of these factors to end up with a network of direct associations connecting the path from genotype to phenotype. In this article we approach this challenge with mixed graphical Markov models, higher-order conditional independences, and q-order correlation graphs. These models show that additive genetic effects propagate through the network as function of gene-gene correlations. Our estimation of the eQTL network underlying a well-studied yeast data set leads to a sparse structure with more direct genetic and regulatory associations that enable a straightforward comparison of the genetic control of gene expression across chromosomes. Interestingly, it also reveals that eQTLs explain most of the expression variability of network hub genes.
Collapse
|
25
|
Kim DC, Wang J, Liu C, Gao J. Inference of SNP-gene regulatory networks by integrating gene expressions and genetic perturbations. BIOMED RESEARCH INTERNATIONAL 2014; 2014:629697. [PMID: 25136606 PMCID: PMC4127230 DOI: 10.1155/2014/629697] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Accepted: 05/09/2014] [Indexed: 11/18/2022]
Abstract
In order to elucidate the overall relationships between gene expressions and genetic perturbations, we propose a network inference method to infer gene regulatory network where single nucleotide polymorphism (SNP) is involved as a regulator of genes. In the most of the network inferences named as SNP-gene regulatory network (SGRN) inference, pairs of SNP-gene are given by separately performing expression quantitative trait loci (eQTL) mappings. In this paper, we propose a SGRN inference method without predefined eQTL information assuming a gene is regulated by a single SNP at most. To evaluate the performance, the proposed method was applied to random data generated from synthetic networks and parameters. There are three main contributions. First, the proposed method provides both the gene regulatory inference and the eQTL identification. Second, the experimental results demonstrated that integration of multiple methods can produce competitive performances. Lastly, the proposed method was also applied to psychiatric disorder data in order to explore how the method works with real data.
Collapse
Affiliation(s)
- Dong-Chul Kim
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Jiao Wang
- Beijing Genomics Institution at Wuhan, Wuhan 430075, China
| | - Chunyu Liu
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 66012, USA
| | - Jean Gao
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA
| |
Collapse
|
26
|
Tétard-Jones C, Gatehouse AMR, Cooper J, Leifert C, Rushton S. Modelling pathways to Rubisco degradation: a structural equation network modelling approach. PLoS One 2014; 9:e87597. [PMID: 24498339 PMCID: PMC3911993 DOI: 10.1371/journal.pone.0087597] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2013] [Accepted: 12/23/2013] [Indexed: 11/19/2022] Open
Abstract
'Omics analysis (transcriptomics, proteomics) quantifies changes in gene/protein expression, providing a snapshot of changes in biochemical pathways over time. Although tools such as modelling that are needed to investigate the relationships between genes/proteins already exist, they are rarely utilised. We consider the potential for using Structural Equation Modelling to investigate protein-protein interactions in a proposed Rubisco protein degradation pathway using previously published data from 2D electrophoresis and mass spectrometry proteome analysis. These informed the development of a prior model that hypothesised a pathway of Rubisco Large Subunit and Small Subunit degradation, producing both primary and secondary degradation products. While some of the putative pathways were confirmed by the modelling approach, the model also demonstrated features that had not been originally hypothesised. We used Bayesian analysis based on Markov Chain Monte Carlo simulation to generate output statistics suggesting that the model had replicated the variation in the observed data due to protein-protein interactions. This study represents an early step in the development of approaches that seek to enable the full utilisation of information regarding the dynamics of biochemical pathways contained within proteomics data. As these approaches gain attention, they will guide the design and conduct of experiments that enable 'Omics modelling to become a common place practice within molecular biology.
Collapse
Affiliation(s)
- Catherine Tétard-Jones
- Molecular Agriculture Group, Nafferton Ecological Farming Group, School of Agriculture, Food and Rural Development, Newcastle University, Newcastle-Upon-Tyne, United Kingdom
- * E-mail:
| | - Angharad M. R. Gatehouse
- Molecular Agriculture Group, School of Biology, Newcastle University, Newcastle-Upon-Tyne, United Kingdom
| | - Julia Cooper
- Molecular Agriculture Group, Nafferton Ecological Farming Group, School of Agriculture, Food and Rural Development, Newcastle University, Newcastle-Upon-Tyne, United Kingdom
| | - Carlo Leifert
- Molecular Agriculture Group, Nafferton Ecological Farming Group, School of Agriculture, Food and Rural Development, Newcastle University, Newcastle-Upon-Tyne, United Kingdom
| | - Steven Rushton
- School of Biology, Newcastle University, Newcastle-Upon-Tyne, United Kingdom
| |
Collapse
|
27
|
Exploring causal networks of bovine milk fatty acids in a multivariate mixed model context. Genet Sel Evol 2014; 46:2. [PMID: 24438068 PMCID: PMC3922748 DOI: 10.1186/1297-9686-46-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2013] [Accepted: 12/06/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Knowledge regarding causal relationships among traits is important to understand complex biological systems. Structural equation models (SEM) can be used to quantify the causal relations between traits, which allow prediction of outcomes to interventions applied to such a network. Such models are fitted conditionally on a causal structure among traits, represented by a directed acyclic graph and an Inductive Causation (IC) algorithm can be used to search for causal structures. The aim of this study was to explore the space of causal structures involving bovine milk fatty acids and to select a network supported by data as the structure of a SEM. RESULTS The IC algorithm adapted to mixed models settings was applied to study 14 correlated bovine milk fatty acids, resulting in an undirected network. The undirected pathway from C4:0 to C12:0 resembled the de novo synthesis pathway of short and medium chain saturated fatty acids. By using prior knowledge, directions were assigned to that part of the network and the resulting structure was used to fit a SEM that led to structural coefficients ranging from 0.85 to 1.05. The deviance information criterion indicated that the SEM was more plausible than the multi-trait model. CONCLUSIONS The IC algorithm output pointed towards causal relations between the studied traits. This changed the focus from marginal associations between traits to direct relationships, thus towards relationships that may result in changes when external interventions are applied. The causal structure can give more insight into underlying mechanisms and the SEM can predict conditional changes due to such interventions.
Collapse
|
28
|
Hitzemann R, Bottomly D, Iancu O, Buck K, Wilmot B, Mooney M, Searles R, Zheng C, Belknap J, Crabbe J, McWeeney S. The genetics of gene expression in complex mouse crosses as a tool to study the molecular underpinnings of behavior traits. Mamm Genome 2013; 25:12-22. [PMID: 24374554 PMCID: PMC3916704 DOI: 10.1007/s00335-013-9495-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 11/25/2013] [Indexed: 02/06/2023]
Abstract
Complex Mus musculus crosses provide increased resolution to examine the relationships between gene expression and behavior. While the advantages are clear, there are numerous analytical and technological concerns that arise from the increased genetic complexity that must be considered. Each of these issues is discussed, providing an initial framework for complex cross study design and planning.
Collapse
Affiliation(s)
- Robert Hitzemann
- Portland Alcohol Research Center, Veterans Affairs Medical Center, Portland, 97239, OR, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Dong Z, Song T, Yuan C. Inference of gene regulatory networks from genetic perturbations with linear regression model. PLoS One 2013; 8:e83263. [PMID: 24376676 PMCID: PMC3871530 DOI: 10.1371/journal.pone.0083263] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 11/01/2013] [Indexed: 11/19/2022] Open
Abstract
It is an effective strategy to use both genetic perturbation data and gene expression data to infer regulatory networks that aims to improve the detection accuracy of the regulatory relationships among genes. Based on both types of data, the genetic regulatory networks can be accurately modeled by Structural Equation Modeling (SEM). In this paper, a linear regression (LR) model is formulated based on the SEM, and a novel iterative scheme using Bayesian inference is proposed to estimate the parameters of the LR model (LRBI). Comparative evaluations of LRBI with other two algorithms, the Adaptive Lasso (AL-Based) and the Sparsity-aware Maximum Likelihood (SML), are also presented. Simulations show that LRBI has significantly better performance than AL-Based, and overperforms SML in terms of power of detection. Applying the LRBI algorithm to experimental data, we inferred the interactions in a network of 35 yeast genes. An open-source program of the LRBI algorithm is freely available upon request.
Collapse
Affiliation(s)
- Zijian Dong
- School of Electronic Engineering, Huaihai Institute of Technology, Lianyungang, Jiangsu, China ; School of Information Science and Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Tiecheng Song
- School of Information Science and Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Chuang Yuan
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hang Kong, China
| |
Collapse
|
30
|
Peng CH, Jiang YZ, Tai AS, Liu CB, Peng SC, Liao CT, Yen TC, Hsieh WP. Causal inference of gene regulation with subnetwork assembly from genetical genomics data. Nucleic Acids Res 2013; 42:2803-19. [PMID: 24322297 PMCID: PMC3950678 DOI: 10.1093/nar/gkt1277] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Deciphering the causal networks of gene interactions is critical for identifying disease pathways and disease-causing genes. We introduce a method to reconstruct causal networks based on exploring phenotype-specific modules in the human interactome and including the expression quantitative trait loci (eQTLs) that underlie the joint expression variation of each module. Closely associated eQTLs help anchor the orientation of the network. To overcome the inherent computational complexity of causal network reconstruction, we first deduce the local causality of individual subnetworks using the selected eQTLs and module transcripts. These subnetworks are then integrated to infer a global causal network using a random-field ranking method, which was motivated by animal sociology. We demonstrate how effectively the inferred causality restores the regulatory structure of the networks that mediate lymph node metastasis in oral cancer. Network rewiring clearly characterizes the dynamic regulatory systems of distinct disease states. This study is the first to associate an RXRB-causal network with increased risks of nodal metastasis, tumor relapse, distant metastases and poor survival for oral cancer. Thus, identifying crucial upstream drivers of a signal cascade can facilitate the discovery of potential biomarkers and effective therapeutic targets.
Collapse
Affiliation(s)
- Chien-Hua Peng
- Departments of Resource Center for Clinical Research, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China, Institute of Statistics, National Tsing Hua University, Hsinchu 30013, Taiwan, Republic of China, Nuclear Medicine and Molecular Imaging Center, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China and Department of Otorhinolaryngology, Head and Neck Surgery, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
31
|
Josset L, Tisoncik-Go J, Katze MG. Moving H5N1 studies into the era of systems biology. Virus Res 2013; 178:151-67. [PMID: 23499671 PMCID: PMC3834220 DOI: 10.1016/j.virusres.2013.02.011] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Accepted: 02/24/2013] [Indexed: 12/20/2022]
Abstract
The dynamics of H5N1 influenza virus pathogenesis are multifaceted and can be seen as an emergent property that cannot be comprehended without looking at the system as a whole. In past years, most of the high-throughput studies on H5N1-host interactions have focused on the host transcriptomic response, at the cellular or the lung tissue level. These studies pointed out that the dynamics and magnitude of the innate immune response and immune cell infiltration is critical to H5N1 pathogenesis. However, viral-host interactions are multidimensional and advances in technologies are creating new possibilities to systematically measure additional levels of 'omic data (e.g. proteomic, metabolomic, and RNA profiling) at each temporal and spatial scale (from the single cell to the organism) of the host response. Natural host genetic variation represents another dimension of the host response that determines pathogenesis. Systems biology models of H5N1 disease aim at understanding and predicting pathogenesis through integration of these different dimensions by using intensive computational modeling. In this review, we describe the importance of 'omic studies for providing a more comprehensive view of infection and mathematical models that are being developed to integrate these data. This review provides a roadmap for what needs to be done in the future and what computational strategies should be used to build a global model of H5N1 pathogenesis. It is time for systems biology of H5N1 pathogenesis to take center stage as the field moves toward a more comprehensive view of virus-host interactions.
Collapse
Affiliation(s)
- Laurence Josset
- Department of Microbiology, School of Medicine, University of Washington, Seattle, WA 98195, United States
| | | | | |
Collapse
|
32
|
Cai X, Bazerque JA, Giannakis GB. Inference of gene regulatory networks with sparse structural equation models exploiting genetic perturbations. PLoS Comput Biol 2013; 9:e1003068. [PMID: 23717196 PMCID: PMC3662697 DOI: 10.1371/journal.pcbi.1003068] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Accepted: 03/28/2013] [Indexed: 12/22/2022] Open
Abstract
Integrating genetic perturbations with gene expression data not only improves accuracy of regulatory network topology inference, but also enables learning of causal regulatory relations between genes. Although a number of methods have been developed to integrate both types of data, the desiderata of efficient and powerful algorithms still remains. In this paper, sparse structural equation models (SEMs) are employed to integrate both gene expression data and cis-expression quantitative trait loci (cis-eQTL), for modeling gene regulatory networks in accordance with biological evidence about genes regulating or being regulated by a small number of genes. A systematic inference method named sparsity-aware maximum likelihood (SML) is developed for SEM estimation. Using simulated directed acyclic or cyclic networks, the SML performance is compared with that of two state-of-the-art algorithms: the adaptive Lasso (AL) based scheme, and the QTL-directed dependency graph (QDG) method. Computer simulations demonstrate that the novel SML algorithm offers significantly better performance than the AL-based and QDG algorithms across all sample sizes from 100 to 1,000, in terms of detection power and false discovery rate, in all the cases tested that include acyclic or cyclic networks of 10, 30 and 300 genes. The SML method is further applied to infer a network of 39 human genes that are related to the immune function and are chosen to have a reliable eQTL per gene. The resulting network consists of 9 genes and 13 edges. Most of the edges represent interactions reasonably expected from experimental evidence, while the remaining may just indicate the emergence of new interactions. The sparse SEM and efficient SML algorithm provide an effective means of exploiting both gene expression and perturbation data to infer gene regulatory networks. An open-source computer program implementing the SML algorithm is freely available upon request.
Collapse
Affiliation(s)
- Xiaodong Cai
- Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL, USA.
| | | | | |
Collapse
|
33
|
Abstract
Current efforts in systems genetics have focused on the development of statistical approaches that aim to disentangle causal relationships among molecular phenotypes in segregating populations. Reverse engineering of transcriptional networks plays a key role in the understanding of gene regulation. However, transcriptional regulation is only one possible mechanism, as methylation, phosphorylation, direct protein-protein interaction, transcription factor binding, etc., can also contribute to gene regulation. These additional modes of regulation can be interpreted as unobserved variables in the transcriptional gene network and can potentially affect its reconstruction accuracy. We develop tests of causal direction for a pair of phenotypes that may be embedded in a more complicated but unobserved network by extending Vuong's selection tests for misspecified models. Our tests provide a significance level, which is unavailable for the widely used AIC and BIC criteria. We evaluate the performance of our tests against the AIC, BIC, and a recently published causality inference test in simulation studies. We compare the precision of causal calls using biologically validated causal relationships extracted from a database of 247 knockout experiments in yeast. Our model selection tests are more precise, showing greatly reduced false-positive rates compared to the alternative approaches. In practice, this is a useful feature since follow-up studies tend to be time consuming and expensive and, hence, it is important for the experimentalist to have causal predictions with low false-positive rates.
Collapse
|
34
|
Flassig RJ, Heise S, Sundmacher K, Klamt S. An effective framework for reconstructing gene regulatory networks from genetical genomics data. Bioinformatics 2012; 29:246-54. [PMID: 23175757 DOI: 10.1093/bioinformatics/bts679] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Systems Genetics approaches, in particular those relying on genetical genomics data, put forward a new paradigm of large-scale genome and network analysis. These methods use naturally occurring multi-factorial perturbations (e.g. polymorphisms) in properly controlled and screened genetic crosses to elucidate causal relationships in biological networks. However, although genetical genomics data contain rich information, a clear dissection of causes and effects as required for reconstructing gene regulatory networks is not easily possible. RESULTS We present a framework for reconstructing gene regulatory networks from genetical genomics data where genotype and phenotype correlation measures are used to derive an initial graph which is subsequently reduced by pruning strategies to minimize false positive predictions. Applied to realistic simulated genetic data from a recent DREAM challenge, we demonstrate that our approach is simple yet effective and outperforms more complex methods (including the best performer) with respect to (i) reconstruction quality (especially for small sample sizes) and (ii) applicability to large data sets due to relatively low computational costs. We also present reconstruction results from real genetical genomics data of yeast. AVAILABILITY A MATLAB implementation (script) of the reconstruction framework is available at www.mpi-magdeburg.mpg.de/projects/cna/etcdownloads.html CONTACT klamt@mpi-magdeburg.mpg.de.
Collapse
Affiliation(s)
- R J Flassig
- Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstr. 1, 39106 Magdeburg, Germany
| | | | | | | |
Collapse
|
35
|
Nuzhdin SV, Friesen ML, McIntyre LM. Genotype-phenotype mapping in a post-GWAS world. Trends Genet 2012; 28:421-6. [PMID: 22818580 DOI: 10.1016/j.tig.2012.06.003] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2012] [Revised: 05/22/2012] [Accepted: 06/18/2012] [Indexed: 01/18/2023]
Abstract
Understanding how metabolic reactions, cell signaling, and developmental pathways translate the genome of an organism into its phenotype is a grand challenge in biology. Genome-wide association studies (GWAS) statistically connect genotypes to phenotypes, without any recourse to known molecular interactions, whereas a molecular biology approach directly ties gene function to phenotype through gene regulatory networks (GRNs). Using natural variation in allele-specific expression, GWAS and GRN approaches can be merged into a single framework via structural equation modeling (SEM). This approach leverages the myriad of polymorphisms in natural populations to elucidate and quantitate the molecular pathways that underlie phenotypic variation. The SEM framework can be used to quantitate a GRN, evaluate its consistency across environments or sexes, identify the differences in GRNs between species, and annotate GRNs de novo in non-model organisms.
Collapse
Affiliation(s)
- Sergey V Nuzhdin
- University of Southern California, Program in Molecular and Computational Biology, Department of Biology, Los Angeles, CA 90089, USA.
| | | | | |
Collapse
|
36
|
Yoo C. Bayesian Method for Causal Discovery of Latent-Variable Models from a Mixture of Experimental and Observational Data. Comput Stat Data Anal 2012; 56:2183-2205. [PMID: 32831439 DOI: 10.1016/j.csda.2012.01.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
This paper describes a Bayesian method for learning causal Bayesian networks through networks that contain latent variables from an arbitrary mixture of observational and experimental data. The paper presents Bayesian methods (including a new method) for learning the causal structure and parameters of the underlying causal process that is generating the data, given that the data contain a mixture of observational and experimental cases. These learning methods were applied using as input various mixtures of experimental and observational data that were generated from the ALARM causal Bayesian network. The paper reports how these structure predictions and parameter estimates compare with the true causal structures and parameters as given by the ALARM network. The paper shows that (1) the new method for learning Bayesian network structure from a mixture of data that this paper introduce, Gibbs Volume method, best estimates the probability of the data given the latent variable model and (2) using large data (>10,000 cases), another model, the implicit latent variable method, is asymptotically correct and efficient.
Collapse
Affiliation(s)
- Changwon Yoo
- Department of Biostatistics, Florida International University, 11200 SW 8 St., AHC2 580, Miami, FL 33199, / Tel: 305-348-4906
| |
Collapse
|
37
|
Mutshinda CM, Noykova N, Sillanpää MJ. A hierarchical bayesian approach to multi-trait clinical quantitative trait locus modeling. Front Genet 2012; 3:97. [PMID: 22685451 PMCID: PMC3368303 DOI: 10.3389/fgene.2012.00097] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2011] [Accepted: 05/12/2012] [Indexed: 02/04/2023] Open
Abstract
Recent advances in high-throughput genotyping and transcript profiling technologies have enabled the inexpensive production of genome-wide dense marker maps in tandem with huge amounts of expression profiles. These large-scale data encompass valuable information about the genetic architecture of important phenotypic traits. Comprehensive models that combine molecular markers and gene transcript levels are increasingly advocated as an effective approach to dissecting the genetic architecture of complex phenotypic traits. The simultaneous utilization of marker and gene expression data to explain the variation in clinical quantitative trait, known as clinical quantitative trait locus (cQTL) mapping, poses challenges that are both conceptual and computational. Nonetheless, the hierarchical Bayesian (HB) modeling approach, in combination with modern computational tools such as Markov chain Monte Carlo (MCMC) simulation techniques, provides much versatility for cQTL analysis. Sillanpää and Noykova (2008) developed a HB model for single-trait cQTL analysis in inbred line cross-data using molecular markers, gene expressions, and marker-gene expression pairs. However, clinical traits generally relate to one another through environmental correlations and/or pleiotropy. A multi-trait approach can improve on the power to detect genetic effects and on their estimation precision. A multi-trait model also provides a framework for examining a number of biologically interesting hypotheses. In this paper we extend the HB cQTL model for inbred line crosses proposed by Sillanpää and Noykova to a multi-trait setting. We illustrate the implementation of our new model with simulated data, and evaluate the multi-trait model performance with regard to its single-trait counterpart. The data simulation process was based on the multi-trait cQTL model, assuming three traits with uncorrelated and correlated cQTL residuals, with the simulated data under uncorrelated cQTL residuals serving as our test set for comparing the performances of the multi-trait and single-trait models. The simulated data under correlated cQTL residuals were essentially used to assess how well our new model can estimate the cQTL residual covariance structure. The model fitting to the data was carried out by MCMC simulation through OpenBUGS. The multi-trait model outperformed its single-trait counterpart in identifying cQTLs, with a consistently lower false discovery rate. Moreover, the covariance matrix of cQTL residuals was typically estimated to an appreciable degree of precision under the multi-trait cQTL model, making our new model a promising approach to addressing a wide range of issues facing the analysis of correlated clinical traits.
Collapse
Affiliation(s)
- Crispin M Mutshinda
- Department of Mathematics and Statistics, University of Helsinki Helsinki, Finland
| | | | | |
Collapse
|
38
|
Vignes M, Vandel J, Allouche D, Ramadan-Alban N, Cierco-Ayrolles C, Schiex T, Mangin B, de Givry S. Gene regulatory network reconstruction using Bayesian networks, the Dantzig Selector, the Lasso and their meta-analysis. PLoS One 2011; 6:e29165. [PMID: 22216195 PMCID: PMC3246469 DOI: 10.1371/journal.pone.0029165] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 11/22/2011] [Indexed: 11/18/2022] Open
Abstract
Modern technologies and especially next generation sequencing facilities are giving a cheaper access to genotype and genomic data measured on the same sample at once. This creates an ideal situation for multifactorial experiments designed to infer gene regulatory networks. The fifth "Dialogue for Reverse Engineering Assessments and Methods" (DREAM5) challenges are aimed at assessing methods and associated algorithms devoted to the inference of biological networks. Challenge 3 on "Systems Genetics" proposed to infer causal gene regulatory networks from different genetical genomics data sets. We investigated a wide panel of methods ranging from Bayesian networks to penalised linear regressions to analyse such data, and proposed a simple yet very powerful meta-analysis, which combines these inference methods. We present results of the Challenge as well as more in-depth analysis of predicted networks in terms of structure and reliability. The developed meta-analysis was ranked first among the 16 teams participating in Challenge 3A. It paves the way for future extensions of our inference method and more accurate gene network estimates in the context of genetical genomics.
Collapse
Affiliation(s)
- Matthieu Vignes
- SaAB Team/BIA Unit, INRA Toulouse, Castanet-Tolosan, France.
| | | | | | | | | | | | | | | |
Collapse
|
39
|
Ahmad FK, Deris S, Othman NH. The inference of breast cancer metastasis through gene regulatory networks. J Biomed Inform 2011; 45:350-62. [PMID: 22179053 DOI: 10.1016/j.jbi.2011.11.015] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2010] [Revised: 11/26/2011] [Accepted: 11/28/2011] [Indexed: 11/30/2022]
Abstract
Understanding the mechanisms of gene regulation during breast cancer is one of the most difficult problems among oncologists because this regulation is likely comprised of complex genetic interactions. Given this complexity, a computational study using the Bayesian network technique has been employed to construct a gene regulatory network from microarray data. Although the Bayesian network has been notified as a prominent method to infer gene regulatory processes, learning the Bayesian network structure is NP hard and computationally intricate. Therefore, we propose a novel inference method based on low-order conditional independence that extends to the case of the Bayesian network to deal with a large number of genes and an insufficient sample size. This method has been evaluated and compared with full-order conditional independence and different prognostic indices on a publicly available breast cancer data set. Our results suggest that the low-order conditional independence method will be able to handle a large number of genes in a small sample size with the least mean square error. In addition, this proposed method performs significantly better than other methods, including the full-order conditional independence and the St. Gallen consensus criteria. The proposed method achieved an area under the ROC curve of 0.79203, whereas the full-order conditional independence and the St. Gallen consensus criteria obtained 0.76438 and 0.73810, respectively. Furthermore, our empirical evaluation using the low-order conditional independence method has demonstrated a promising relationship between six gene regulators and two regulated genes and will be further investigated as potential breast cancer metastasis prognostic markers.
Collapse
Affiliation(s)
- F K Ahmad
- Graduate Department of Computer Science, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia.
| | | | | |
Collapse
|
40
|
Yin J, Li H. A SPARSE CONDITIONAL GAUSSIAN GRAPHICAL MODEL FOR ANALYSIS OF GENETICAL GENOMICS DATA. Ann Appl Stat 2011; 5:2630-2650. [PMID: 22905077 DOI: 10.1214/11-aoas494] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Genetical genomics experiments have now been routinely conducted to measure both the genetic markers and gene expression data on the same subjects. The gene expression levels are often treated as quantitative traits and are subject to standard genetic analysis in order to identify the gene expression quantitative loci (eQTL). However, the genetic architecture for many gene expressions may be complex, and poorly estimated genetic architecture may compromise the inferences of the dependency structures of the genes at the transcriptional level. In this paper, we introduce a sparse conditional Gaussian graphical model for studying the conditional independent relationships among a set of gene expressions adjusting for possible genetic effects where the gene expressions are modeled with seemingly unrelated regressions. We present an efficient coordinate descent algorithm to obtain the penalized estimation of both the regression coefficients and sparse concentration matrix. The corresponding graph can be used to determine the conditional independence among a group of genes while adjusting for shared genetic effects. Simulation experiments and asymptotic convergence rates and sparsistency are used to justify our proposed methods. By sparsistency, we mean the property that all parameters that are zero are actually estimated as zero with probability tending to one. We apply our methods to the analysis of a yeast eQTL data set and demonstrate that the conditional Gaussian graphical model leads to more interpretable gene network than standard Gaussian graphical model based on gene expression data alone.
Collapse
Affiliation(s)
- Jianxin Yin
- University of Pennsylvania School of Medicine
| | | |
Collapse
|
41
|
YUAN AO, CHEN GUANJIE, ROTIMI CHARLES. GENETIC NETWORK ANALYSIS BY QUASI-BAYESIAN METHOD. J Bioinform Comput Biol 2011; 7:175-92. [DOI: 10.1142/s0219720009004059] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2008] [Revised: 10/23/2008] [Accepted: 11/14/2008] [Indexed: 11/18/2022]
Abstract
Genetic network analysis provides an important statistical strategy for the study of gene–gene interactions. Although existing methods work well in practice, several opportunities for improvement remain. For example, the regulation coefficients of some of the existing methods are not easy to solve, nor are the solutions they provide unique. Also, as genetic network analysis are typically applied to small datasets with large number of parameters, having prior knowledge about the parameters is valuable and should be incorporated into the analysis. The uniqueness of the parameter estimate and computational simplicity are also desirable in practice. To address these problems, we considered a quasi-Bayesian method for the analysis of gene regulatory networks by a multivariate linear model in which the data distribution is a quasi-likelihood, and the inference is Bayesian. This method incorporates prior information on the regulatory relationships; the set of regulation coefficients has a unique closed-form solution, and is very simple to compute. The model is evaluated by simulation and illustrated using a real dataset. This method is simple to use, permits information updating, is flexible to incorporate desired features, and has closed-form solution. Simulation studies show that the model fits the data quite well.
Collapse
Affiliation(s)
- AO YUAN
- National Human Genome Center, Howard University, 2216 Sixth Street, N.W., Suite 206, Washington, DC 20059, USA
| | - GUANJIE CHEN
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, 12 South Drive, Bethesda, MD 20892, USA
| | - CHARLES ROTIMI
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, 12 South Drive, Bethesda, MD 20892, USA
| |
Collapse
|
42
|
Aburatani S. Application of structure equation modeling for inferring a serial transcriptional regulation in yeast. GENE REGULATION AND SYSTEMS BIOLOGY 2011; 5:75-88. [PMID: 22272062 PMCID: PMC3236004 DOI: 10.4137/grsb.s7569] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Revealing the gene regulatory systems among DNA and proteins in living cells is one of the central aims of systems biology. In this study, I used Structural Equation Modeling (SEM) in combination with stepwise factor analysis to infer the protein-DNA interactions for gene expression control from only gene expression profiles, in the absence of protein information. I applied my approach to infer the causalities within the well-studied serial transcriptional regulation composed of GAL-related genes in yeast. This allowed me to reveal the hierarchy of serial transcriptional regulation, including previously unclear protein-DNA interactions. The validity of the constructed model was demonstrated by comparing the results with previous reports describing the regulation of the transcription factors. Furthermore, the model revealed combinatory regulation by Gal4p and Gal80p. In this study, the target genes were divided into three types: those regulated by one factor and those controlled by a combination of two factors.
Collapse
Affiliation(s)
- Sachiyo Aburatani
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan
| |
Collapse
|
43
|
Effects of causal networks on the structure and stability of resource allocation trait correlations. J Theor Biol 2011; 293:1-14. [PMID: 22004994 DOI: 10.1016/j.jtbi.2011.09.034] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2010] [Revised: 09/16/2011] [Accepted: 09/30/2011] [Indexed: 11/23/2022]
Abstract
Discovering the mechanisms by which genetic variation influences phenotypes is integral to understanding life-history evolution. Models describing causal relationships among traits in a developmental hierarchy provide a functional basis for understanding the correlations often observed among life-history traits. In this paper, we evaluate a developmental network model of life-history traits based on the perennial herb Arabidopsis lyrata, evaluate phenotypic, genetic, and environmental covariance matrices obtained under different scenarios of quantitative trait locus (QTL) effects in simulated crosses, test the efficacy of structural equation modeling to identify the correct basis for multiple-trait QTL effects, and compare model predictions with field data. We found that the trait network constrained the phenotypic covariance patterns to varying degrees, depending on which traits were directly affected by QTLs. Genetic and environmental covariance matrices were strongly correlated only when direct QTL effects were spread over many traits. Structural equation models that included all simulated traits correctly identified traits directly affected by QTLs, but heuristic search algorithms found several network structures other than the correct one that also fit the data closely. Estimated correlations among a subset of traits in F(2) data from field studies corresponded closely to model predictions when simulated QTLs affected traits known to differ between the parental populations. Our results show that causal trait network models can unify several aspects of quantitative genetic theory with empirical observations on genetic and phenotypic covariance patterns, and that incorporating trait networks into genetic analysis offers promise for elucidating mechanisms of life history evolution.
Collapse
|
44
|
Casellas J, Ibáñez-Escriche N. Bayesian recursive mixed linear model for gene expression analyses with continuous covariates. J Anim Sci 2011; 90:67-75. [PMID: 21908645 DOI: 10.2527/jas.2010-3750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The analysis of microarray gene expression data has experienced a remarkable growth in scientific research over the last few years and is helping to decipher the genetic background of several productive traits. Nevertheless, most analytical approaches have relied on the comparison of 2 (or a few) well-defined groups of biological conditions where the continuous covariates have no sense (e.g., healthy vs. cancerous cells). Continuous effects could be of special interest when analyzing gene expression in animal production-oriented studies (e.g., birth weight), although very few studies address this peculiarity in the animal science framework. Within this context, we have developed a recursive linear mixed model where not only are linear covariates accounted for during gene expression analyses but also hierarchized and the effects of their genetic, environmental, and residual components on differential gene expression inferred independently. This parameterization allows a step forward in the inference of differential gene expression linked to a given quantitative trait such as birth weight. The statistical performance of this recursive model was exemplified under simulation by accounting for different sample sizes (n), heritabilities for the quantitative trait (h(2)), and magnitudes of differential gene expression (λ). It is important to highlight that statistical power increased with n, h(2), and λ, and the recursive model exceeded the standard linear mixed model with linear (nonrecursive) covariates in the majority of scenarios. This new parameterization would provide new insights about gene expression in the animal science framework, opening a new research scenario where within-covariate sources of differential gene expression could be individualized and estimated. The source code of the program accommodating these analytical developments and additional information about practical aspects on running the program are freely available by request to the corresponding author of this article.
Collapse
Affiliation(s)
- J Casellas
- Grup de Recerca en Remugants, Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain.
| | | |
Collapse
|
45
|
Pinna A, Soranzo N, Hoeschele I, de la Fuente A. Simulating systems genetics data with SysGenSIM. Bioinformatics 2011; 27:2459-62. [PMID: 21737438 DOI: 10.1093/bioinformatics/btr407] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
SUMMARY SysGenSIM is a software package to simulate Systems Genetics (SG) experiments in model organisms, for the purpose of evaluating and comparing statistical and computational methods and their implementations for analyses of SG data [e.g. methods for expression quantitative trait loci (eQTL) mapping and network inference]. SysGenSIM allows the user to select a variety of network topologies, genetic and kinetic parameters to simulate SG data ( genotyping, gene expression and phenotyping) with large gene networks with thousands of nodes. The software is encoded in MATLAB, and a user-friendly graphical user interface is provided. AVAILABILITY The open-source software code and user manual can be downloaded at: http://sysgensim.sourceforge.net/ CONTACT alf@crs4.it.
Collapse
|
46
|
Yang B, Navarro N, Noguera J, Muñoz M, Guo T, Yang K, Ma J, Folch J, Huang L, Pérez-Enciso M. Building phenotype networks to improve QTL detection: a comparative analysis of fatty acid and fat traits in pigs. J Anim Breed Genet 2011; 128:329-43. [DOI: 10.1111/j.1439-0388.2011.00928.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
47
|
Wang H, Lu HHS, Chueh TH. Constructing biological pathways by a two-step counting approach. PLoS One 2011; 6:e20074. [PMID: 21673799 PMCID: PMC3105984 DOI: 10.1371/journal.pone.0020074] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Accepted: 04/25/2011] [Indexed: 12/04/2022] Open
Abstract
Networks are widely used in biology to represent the relationships between genes and gene functions. In Boolean biological models, it is mainly assumed that there are two states to represent a gene: on-state and off-state. It is typically assumed that the relationship between two genes can be characterized by two kinds of pairwise relationships: similarity and prerequisite. Many approaches have been proposed in the literature to reconstruct biological relationships. In this article, we propose a two-step method to reconstruct the biological pathway when the binary array data have measurement error. For a pair of genes in a sample, the first step of this approach is to assign counting numbers for every relationship and select the relationship with counting number greater than a threshold. The second step is to calculate the asymptotic p-values for hypotheses of possible relationships and select relationships with a large p-value. This new method has the advantages of easy calculation for the counting numbers and simple closed forms for the p-value. The simulation study and real data example show that the two-step counting method can accurately reconstruct the biological pathway and outperform the existing methods. Compared with the other existing methods, this two-step method can provide a more accurate and efficient alternative approach for reconstructing the biological network.
Collapse
Affiliation(s)
- Hsiuying Wang
- Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan.
| | | | | |
Collapse
|
48
|
Rosa GJM, Valente BD, de los Campos G, Wu XL, Gianola D, Silva MA. Inferring causal phenotype networks using structural equation models. Genet Sel Evol 2011; 43:6. [PMID: 21310061 PMCID: PMC3056759 DOI: 10.1186/1297-9686-43-6] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2010] [Accepted: 02/10/2011] [Indexed: 01/14/2023] Open
Abstract
Phenotypic traits may exert causal effects between them. For example, on the one hand, high yield in dairy cows may increase the liability to certain diseases and, on the other hand, the incidence of a disease may affect yield negatively. Likewise, the transcriptome may be a function of the reproductive status in mammals and the latter may depend on other physiological variables. Knowledge of phenotype networks describing such interrelationships can be used to predict the behavior of complex systems, e.g. biological pathways underlying complex traits such as diseases, growth and reproduction. Structural Equation Models (SEM) can be used to study recursive and simultaneous relationships among phenotypes in multivariate systems such as genetical genomics, system biology, and multiple trait models in quantitative genetics. Hence, SEM can produce an interpretation of relationships among traits which differs from that obtained with traditional multiple trait models, in which all relationships are represented by symmetric linear associations among random variables, such as covariances and correlations. In this review, we discuss the application of SEM and related techniques for the study of multiple phenotypes. Two basic scenarios are considered, one pertaining to genetical genomics studies, in which QTL or molecular marker information is used to facilitate causal inference, and another related to quantitative genetic analysis in livestock, in which only phenotypic and pedigree information is available. Advantages and limitations of SEM compared to traditional approaches commonly used for the analysis of multiple traits, as well as some indication of future research in this area are presented in a concluding section.
Collapse
Affiliation(s)
- Guilherme J M Rosa
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA.
| | | | | | | | | | | |
Collapse
|
49
|
Kompass KS, Witte JS. Co-regulatory expression quantitative trait loci mapping: method and application to endometrial cancer. BMC Med Genomics 2011; 4:6. [PMID: 21226949 PMCID: PMC3032645 DOI: 10.1186/1755-8794-4-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2010] [Accepted: 01/12/2011] [Indexed: 01/16/2023] Open
Abstract
Background Expression quantitative trait loci (eQTL) studies have helped identify the genetic determinants of gene expression. Understanding the potential interacting mechanisms underlying such findings, however, is challenging. Methods We describe a method to identify the trans-acting drivers of multiple gene co-expression, which reflects the action of regulatory molecules. This method-termed co-regulatory expression quantitative trait locus (creQTL) mapping-allows for evaluation of a more focused set of phenotypes within a clear biological context than conventional eQTL mapping. Results Applying this method to a study of endometrial cancer revealed regulatory mechanisms supported by the literature: a creQTL between a locus upstream of STARD13/DLC2 and a group of seven IFNβ-induced genes. This suggests that the Rho-GTPase encoded by STARD13 regulates IFNβ-induced genes and the DNA damage response. Conclusions Because of the importance of IFNβ in cancer, our results suggest that creQTL may provide a finer picture of gene regulation and may reveal additional molecular targets for intervention. An open source R implementation of the method is available at http://sites.google.com/site/kenkompass/.
Collapse
Affiliation(s)
- Kenneth S Kompass
- Department of Epidemiology and Biostatistics, Institute for Human Genetics, University of California, San Francisco, USA
| | | |
Collapse
|
50
|
High-confidence discovery of genetic network regulators in expression quantitative trait loci data. Genetics 2011; 187:955-64. [PMID: 21212238 DOI: 10.1534/genetics.110.124685] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Expression QTL (eQTL) studies involve the collection of microarray gene expression data and genetic marker data from segregating individuals in a population to search for genetic determinants of differential gene expression. Previous studies have found large numbers of trans-regulated genes (regulated by unlinked genetic loci) that link to a single locus or eQTL "hotspot," and it would be desirable to find the mechanism of coregulation for these gene groups. However, many difficulties exist with current network reconstruction algorithms such as low power and high computational cost. A common observation for biological networks is that they have a scale-free or power-law architecture. In such an architecture, highly influential nodes exist that have many connections to other nodes. If we assume that this type of architecture applies to genetic networks, then we can simplify the problem of genetic network reconstruction by focusing on discovery of the key regulatory genes at the top of the network. We introduce the concept of "shielding" in which a specific gene expression variable (the shielder) renders a set of other gene expression variables (the shielded genes) independent of the eQTL. We iteratively build networks from the eQTL to the shielder down using tests of conditional independence. We have proposed a novel test for controlling the shielder false-positive rate at a predetermined level by requiring a threshold number of shielded genes per shielder. Using simulation, we have demonstrated that we can control the shielder false-positive rate as well as obtain high shielder and edge specificity. In addition, we have shown our method to be robust to violation of the latent variable assumption, an important feature in the practical application of our method. We have applied our method to a yeast expression QTL data set in which microarray and marker data were collected from the progeny of a backcross of two species of Saccharomyces cerevisiae (Brem et al. 2002). Seven genetic networks have been discovered, and bioinformatic analysis of the discovered regulators and corresponding regulated genes has generated plausible hypotheses for mechanisms of regulation that can be tested in future experiments.
Collapse
|