1
|
Abstract
BACKGROUND Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. RESULTS To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. CONCLUSION We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes' contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.
Collapse
Affiliation(s)
- Rui Xie
- Department of Computer Science, University of Missouri at Columbia, Columbia, MO USA
| | - Jia Wen
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, University City Blvd, Charlotte, NC USA
| | - Andrew Quitadamo
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, University City Blvd, Charlotte, NC USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri at Columbia, Columbia, MO USA
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, University City Blvd, Charlotte, NC USA
| |
Collapse
|
2
|
Guo Y, Fudali S, Gimeno J, DiGennaro P, Chang S, Williamson VM, Bird DM, Nielsen DM. Networks Underpinning Symbiosis Revealed Through Cross-Species eQTL Mapping. Genetics 2017; 206:2175-2184. [PMID: 28642272 PMCID: PMC5560814 DOI: 10.1534/genetics.117.202531] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 06/09/2017] [Indexed: 12/13/2022] Open
Abstract
Organisms engage in extensive cross-species molecular dialog, yet the underlying molecular actors are known for only a few interactions. Many techniques have been designed to uncover genes involved in signaling between organisms. Typically, these focus on only one of the partners. We developed an expression quantitative trait locus (eQTL) mapping-based approach to identify cause-and-effect relationships between genes from two partners engaged in an interspecific interaction. We demonstrated the approach by assaying expression of 98 isogenic plants (Medicago truncatula), each inoculated with a genetically distinct line of the diploid parasitic nematode Meloidogyne hapla With this design, systematic differences in gene expression across host plants could be mapped to genetic polymorphisms of their infecting parasites. The effects of parasite genotypes on plant gene expression were often substantial, with up to 90-fold (P = 3.2 × 10-52) changes in expression levels caused by individual parasite loci. Mapped loci included a number of pleiotropic sites, including one 87-kb parasite locus that modulated expression of >60 host genes. The 213 host genes identified were substantially enriched for transcription factors. We distilled higher-order connections between polymorphisms and genes from both species via network inference. To replicate our results and test whether effects were conserved across a broader host range, we performed a confirmatory experiment using M. hapla-infected tomato. This revealed that homologous genes were similarly affected. Finally, to validate the broader utility of cross-species eQTL mapping, we applied the strategy to data from a Salmonella infection study, successfully identifying polymorphisms in the human genome affecting bacterial expression.
Collapse
Affiliation(s)
- Yuelong Guo
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695
| | - Sylwia Fudali
- Department of Plant Pathology, University of California, Davis, California 95616
| | - Jacinta Gimeno
- Department of Plant Pathology, University of California, Davis, California 95616
| | - Peter DiGennaro
- Department of Plant Pathology, North Carolina State University, Raleigh, North Carolina 27695
| | - Stella Chang
- Department of Plant Pathology, North Carolina State University, Raleigh, North Carolina 27695
| | - Valerie M Williamson
- Department of Plant Pathology, University of California, Davis, California 95616
| | - David McK Bird
- Department of Plant Pathology, North Carolina State University, Raleigh, North Carolina 27695
| | - Dahlia M Nielsen
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695
| |
Collapse
|
3
|
Wanichthanarak K, Fan S, Grapov D, Barupal DK, Fiehn O. Metabox: A Toolbox for Metabolomic Data Analysis, Interpretation and Integrative Exploration. PLoS One 2017; 12:e0171046. [PMID: 28141874 PMCID: PMC5283729 DOI: 10.1371/journal.pone.0171046] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 01/13/2017] [Indexed: 01/22/2023] Open
Abstract
Similar to genomic and proteomic platforms, metabolomic data acquisition and analysis is becoming a routine approach for investigating biological systems. However, computational approaches for metabolomic data analysis and integration are still maturing. Metabox is a bioinformatics toolbox for deep phenotyping analytics that combines data processing, statistical analysis, functional analysis and integrative exploration of metabolomic data within proteomic and transcriptomic contexts. With the number of options provided in each analysis module, it also supports data analysis of other 'omic' families. The toolbox is an R-based web application, and it is freely available at http://kwanjeeraw.github.io/metabox/ under the GPL-3 license.
Collapse
Affiliation(s)
- Kwanjeera Wanichthanarak
- West Coast Metabolomics Center, Genome Center, University of California Davis, Davis, California, United States of America
| | - Sili Fan
- West Coast Metabolomics Center, Genome Center, University of California Davis, Davis, California, United States of America
| | - Dmitry Grapov
- West Coast Metabolomics Center, Genome Center, University of California Davis, Davis, California, United States of America
| | - Dinesh Kumar Barupal
- West Coast Metabolomics Center, Genome Center, University of California Davis, Davis, California, United States of America
| | - Oliver Fiehn
- West Coast Metabolomics Center, Genome Center, University of California Davis, Davis, California, United States of America
- Biochemistry Department, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
4
|
Roverato A, Castelo R. The networked partial correlation and its application to the analysis of genetic interactions. J R Stat Soc Ser C Appl Stat 2016. [DOI: 10.1111/rssc.12166] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
5
|
Rakitsch B, Stegle O. Modelling local gene networks increases power to detect trans-acting genetic effects on gene expression. Genome Biol 2016; 17:33. [PMID: 26911988 PMCID: PMC4765046 DOI: 10.1186/s13059-016-0895-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2015] [Accepted: 02/09/2016] [Indexed: 01/05/2023] Open
Abstract
Expression quantitative trait loci (eQTL) mapping is a widely used tool to study the genetics of gene expression. Confounding factors and the burden of multiple testing limit the ability to map distal trans eQTLs, which is important to understand downstream genetic effects on genes and pathways. We propose a two-stage linear mixed model that first learns local directed gene-regulatory networks to then condition on the expression levels of selected genes. We show that this covariate selection approach controls for confounding factors and regulatory context, thereby increasing eQTL detection power and improving the consistency between studies. GNet-LMM is available at: https://github.com/PMBio/GNetLMM.
Collapse
Affiliation(s)
- Barbara Rakitsch
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| |
Collapse
|
6
|
Baumstark R, Hänzelmann S, Tsuru S, Schaerli Y, Francesconi M, Mancuso FM, Castelo R, Isalan M. The propagation of perturbations in rewired bacterial gene networks. Nat Commun 2015; 6:10105. [PMID: 26670742 DOI: 10.1038/ncomms10105] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 11/04/2015] [Indexed: 11/09/2022] Open
Abstract
What happens to gene expression when you add new links to a gene regulatory network? To answer this question, we profile 85 network rewirings in E. coli. Here we report that concerted patterns of differential expression propagate from reconnected hub genes. The rewirings link promoter regions to different transcription factor and σ-factor genes, resulting in perturbations that span four orders of magnitude, changing up to ∼ 70% of the transcriptome. Importantly, factor connectivity and promoter activity both associate with perturbation size. Perturbations from related rewirings have more similar transcription profiles and a statistical analysis reveals ∼ 20 underlying states of the system, associating particular gene groups with rewiring constructs. We examine two large clusters (ribosomal and flagellar genes) in detail. These represent alternative global outcomes from different rewirings because of antagonism between these major cell states. This data set of systematically related perturbations enables reverse engineering and discovery of underlying network interactions.
Collapse
Affiliation(s)
- Rebecca Baumstark
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain
| | - Sonja Hänzelmann
- Research Program on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Dr Aiguader 88, 08003 Barcelona, Spain.,Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Dr Aiguader 88, 08003 Barcelona, Spain
| | - Saburo Tsuru
- Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan
| | - Yolanda Schaerli
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain
| | - Mirko Francesconi
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain
| | - Francesco M Mancuso
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain.,Genomics Cancer Group, Vall d 'Hebron Institute of Oncology (VHIO), Carrer Natzaret 15-17, 08035 Barcelona, Spain
| | - Robert Castelo
- Research Program on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Dr Aiguader 88, 08003 Barcelona, Spain.,Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Dr Aiguader 88, 08003 Barcelona, Spain
| | - Mark Isalan
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain.,Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
7
|
Peñagaricano F, Valente BD, Steibel JP, Bates RO, Ernst CW, Khatib H, Rosa GJM. Exploring causal networks underlying fat deposition and muscularity in pigs through the integration of phenotypic, genotypic and transcriptomic data. BMC SYSTEMS BIOLOGY 2015; 9:58. [PMID: 26376630 PMCID: PMC4574162 DOI: 10.1186/s12918-015-0207-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 09/04/2015] [Indexed: 12/23/2022]
Abstract
BACKGROUND Joint modeling and analysis of phenotypic, genotypic and transcriptomic data have the potential to uncover the genetic control of gene activity and phenotypic variation, as well as shed light on the manner and extent of connectedness among these variables. Current studies mainly report associations, i.e. undirected connections among variables without causal interpretation. Knowledge regarding causal relationships among genes and phenotypes can be used to predict the behavior of complex systems, as well as to optimize management practices and selection strategies. Here, we performed a multistep procedure for inferring causal networks underlying carcass fat deposition and muscularity in pigs using multi-omics data obtained from an F2 Duroc x Pietrain resource pig population. RESULTS We initially explored marginal associations between genotypes and phenotypic and expression traits through whole-genome scans, and then, in genomic regions with multiple significant hits, we assessed gene-phenotype network reconstruction using causal structural learning algorithms. One genomic region on SSC6 showed significant associations with three relevant phenotypes, off-midline10th-rib backfat thickness, loin muscle weight, and average intramuscular fat percentage, and also with the expression of seven genes, including ZNF24, SSX2IP, and AKR7A2. The inferred network indicated that the genotype affects the three phenotypes mainly through the expression of several genes. Among the phenotypes, fat deposition traits negatively affected loin muscle weight. CONCLUSIONS Our findings shed light on the antagonist relationship between carcass fat deposition and lean meat content in pigs. In addition, the procedure described in this study has the potential to unravel gene-phenotype networks underlying complex phenotypes.
Collapse
Affiliation(s)
- Francisco Peñagaricano
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Present Address: Department of Animal Sciences, and University of Florida Genetics Institute, University of Florida, Gainesville, FL, 326111, USA.
| | - Bruno D Valente
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Dairy Science, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA.
| | - Ronald O Bates
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA.
| | - Catherine W Ernst
- Department of Animal Science, Michigan State University, East Lansing, MI, 48824, USA.
| | - Hasan Khatib
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| | - Guilherme J M Rosa
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| |
Collapse
|