1
|
Li JR, Tang M, Li Y, Amos CI, Cheng C. Genetic variants associated mRNA stability in lung. BMC Genomics 2022; 23:196. [PMID: 35272635 PMCID: PMC8915503 DOI: 10.1186/s12864-022-08405-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Accepted: 02/21/2022] [Indexed: 12/04/2022] Open
Abstract
Background Expression quantitative trait loci (eQTLs) analyses have been widely used to identify genetic variants associated with gene expression levels to understand what molecular mechanisms underlie genetic traits. The resultant eQTLs might affect the expression of associated genes through transcriptional or post-transcriptional regulation. In this study, we attempt to distinguish these two types of regulation by identifying genetic variants associated with mRNA stability of genes (stQTLs). Results Here, we presented a computational framework that takes advantage of recently developed methods to infer the mRNA stability of genes based on RNA-seq data and performed association analysis to identify stQTLs. Using the Genotype-Tissue Expression (GTEx) lung RNA-Seq data, we identified a total of 142,801 stQTLs for 3942 genes and 186,132 eQTLs for 4751 genes from 15,122,700 genetic variants for 13,476 genes on the autosomes, respectively. Interestingly, our results indicated that stQTLs were enriched in the CDS and 3’UTR regions, while eQTLs are enriched in the CDS, 3’UTR, 5’UTR, and upstream regions. We also found that stQTLs are more likely than eQTLs to overlap with RNA binding protein (RBP) and microRNA (miRNA) binding sites. Our analyses demonstrate that simultaneous identification of stQTLs and eQTLs can provide more mechanistic insight on the association between genetic variants and gene expression levels. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08405-y.
Collapse
Affiliation(s)
- Jian-Rong Li
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA.,Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
| | - Mabel Tang
- Department of BioSciences, Biochemistry and Cell Biology, Rice University, Houston, TX, USA
| | - Yafang Li
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA.,Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA.,Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Christopher I Amos
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA.,Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA.,Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Chao Cheng
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA. .,Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA. .,Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
2
|
Yuan K, Zeng T, Chen L. Interpreting Functional Impact of Genetic Variations by Network QTL for Genotype–Phenotype Association Study. Front Cell Dev Biol 2022; 9:720321. [PMID: 35155440 PMCID: PMC8826544 DOI: 10.3389/fcell.2021.720321] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 12/13/2021] [Indexed: 12/18/2022] Open
Abstract
An enormous challenge in the post-genome era is to annotate and resolve the consequences of genetic variation on diverse phenotypes. The genome-wide association study (GWAS) is a well-known method to identify potential genetic loci for complex traits from huge genetic variations, following which it is crucial to identify expression quantitative trait loci (eQTL). However, the conventional eQTL methods usually disregard the systematical role of single-nucleotide polymorphisms (SNPs) or genes, thereby overlooking many network-associated phenotypic determinates. Such a problem motivates us to recognize the network-based quantitative trait loci (QTL), i.e., network QTL (nQTL), which is to detect the cascade association as genotype → network → phenotype rather than conventional genotype → expression → phenotype in eQTL. Specifically, we develop the nQTL framework on the theory and approach of single-sample networks, which can identify not only network traits (e.g., the gene subnetwork associated with genotype) for analyzing complex biological processes but also network signatures (e.g., the interactive gene biomarker candidates screened from network traits) for characterizing targeted phenotype and corresponding subtypes. Our results show that the nQTL framework can efficiently capture associations between SNPs and network traits (i.e., edge traits) in various simulated data scenarios, compared with traditional eQTL methods. Furthermore, we have carried out nQTL analysis on diverse biological and biomedical datasets. Our analysis is effective in detecting network traits for various biological problems and can discover many network signatures for discriminating phenotypes, which can help interpret the influence of nQTL on disease subtyping, disease prognosis, drug response, and pathogen factor association. Particularly, in contrast to the conventional approaches, the nQTL framework could also identify many network traits from human bulk expression data, validated by matched single-cell RNA-seq data in an independent or unsupervised manner. All these results strongly support that nQTL and its detection framework can simultaneously explore the global genotype–network–phenotype associations and the underlying network traits or network signatures with functional impact and importance.
Collapse
Affiliation(s)
- Kai Yuan
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
- Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- Guangzhou Laboratory, Guangzhou, China
- *Correspondence: Tao Zeng, ; Luonan Chen,
| | - Luonan Chen
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
- *Correspondence: Tao Zeng, ; Luonan Chen,
| |
Collapse
|
3
|
Ha MJ, Sun W. Estimation of high-dimensional directed acyclic graphs with surrogate intervention. Biostatistics 2020; 21:659-675. [PMID: 30596892 DOI: 10.1093/biostatistics/kxy080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 11/18/2018] [Accepted: 11/25/2018] [Indexed: 11/15/2022] Open
Abstract
Directed acyclic graphs (DAGs) have been used to describe causal relationships between variables. The standard method for determining such relations uses interventional data. For complex systems with high-dimensional data, however, such interventional data are often not available. Therefore, it is desirable to estimate causal structure from observational data without subjecting variables to interventions. Observational data can be used to estimate the skeleton of a DAG and the directions of a limited number of edges. We develop a Bayesian framework to estimate a DAG using surrogate interventional data, where the interventions are applied to a set of external variables, and thus such interventions are considered to be surrogate interventions on the variables of interest. Our work is motivated by expression quantitative trait locus (eQTL) studies, where the variables of interest are the expression of genes, the external variables are DNA variations, and interventions are applied to DNA variants during the process of a randomly selected DNA allele being passed to a child from either parent. Our method, surrogate intervention recovery of a DAG ($\texttt{sirDAG}$), first constructs a DAG skeleton using penalized regressions and the subsequent partial correlation tests, and then estimates the posterior probabilities of all the edge directions after incorporating DNA variant data. We demonstrate the utilities of $\texttt{sirDAG}$ by simulation and an application to an eQTL study for 550 breast cancer patients.
Collapse
Affiliation(s)
- Min Jin Ha
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, TX, USA
| | - Wei Sun
- Program in Biostatistics and Bioinformatics, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA USA
| |
Collapse
|
4
|
Sun W, Bunn P, Jin C, Little P, Zhabotynsky V, Perou CM, Hayes DN, Chen M, Lin DY. The association between copy number aberration, DNA methylation and gene expression in tumor samples. Nucleic Acids Res 2019. [PMID: 29529299 PMCID: PMC5887505 DOI: 10.1093/nar/gky131] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
We systematically studied the association between somatic copy number aberration (SCNA), DNA methylation and gene expression using -omic data from The Cancer Genome Atlas (TCGA) on six cancer types: breast cancer, colon cancer, glioblastoma, leukemia, lower-grade glioma and prostate cancer. A major challenge for such integrated study is that the association between DNA methylation and gene expression is severely confounded by tumor purity and cell type composition, which are often unobserved and difficult to estimate. To overcome this challenge, we developed a method to remove confounding effects by calculating the principal components that span the space of the latent factors. Another intriguing findings of our study is that there could be both positive and negative associations between SCNA and DNA methylation, while the CpGs with negative/positive associations with SCNA are often located around CpG islands/ocean, respectively. A joint study of SCNA, DNA methylation, and gene expression suggest that SCNA often affect DNA methylation and gene expression independently.
Collapse
Affiliation(s)
- Wei Sun
- Public Health Science Division, Fred Hutchison Cancer Research Center, USA
| | - Paul Bunn
- Department of Biostatistics, University of North Carolina, Chapel Hill, USA
| | - Chong Jin
- Department of Biostatistics, University of North Carolina, Chapel Hill, USA
| | - Paul Little
- Department of Biostatistics, University of North Carolina, Chapel Hill, USA
| | - Vasyl Zhabotynsky
- Department of Biostatistics, University of North Carolina, Chapel Hill, USA
| | - Charles M Perou
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA.,Department of Genetics, University of North Carolina, Chapel Hill, USA
| | - David Neil Hayes
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA.,Department of Medicine, Division of Hematology/Oncology, University of North Carolina, Chapel Hill, USA
| | - Mengjie Chen
- Department of Medicine, University of Chicago, USA.,Department of Human Genetics, University of Chicago, USA
| | - Dan-Yu Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, USA.,Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA
| |
Collapse
|
5
|
Xie B, Becker E, Stuparevic I, Wery M, Szachnowski U, Morillon A, Primig M. The anti-cancer drug 5-fluorouracil affects cell cycle regulators and potential regulatory long non-coding RNAs in yeast. RNA Biol 2019; 16:727-741. [PMID: 30760080 PMCID: PMC6546400 DOI: 10.1080/15476286.2019.1581596] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 01/16/2019] [Accepted: 02/06/2019] [Indexed: 10/27/2022] Open
Abstract
5-fluorouracil (5-FU) was isolated as an inhibitor of thymidylate synthase, which is important for DNA synthesis. The drug was later found to also affect the conserved 3'-5' exoribonuclease EXOSC10/Rrp6, a catalytic subunit of the RNA exosome that degrades and processes protein-coding and non-coding transcripts. Work on 5-FU's cytotoxicity has been focused on mRNAs and non-coding transcripts such as rRNAs, tRNAs and snoRNAs. However, the effect of 5-FU on long non-coding RNAs (lncRNAs), which include regulatory transcripts important for cell growth and differentiation, is poorly understood. RNA profiling of synchronized 5-FU treated yeast cells and protein assays reveal that the drug specifically inhibits a set of cell cycle regulated genes involved in mitotic division, by decreasing levels of the paralogous Swi5 and Ace2 transcriptional activators. We also observe widespread accumulation of different lncRNA types in treated cells, which are typically present at high levels in a strain lacking EXOSC10/Rrp6. 5-FU responsive lncRNAs include potential regulatory antisense transcripts that form double-stranded RNAs (dsRNAs) with overlapping sense mRNAs. Some of these transcripts encode proteins important for cell growth and division, such as the transcription factor Ace2, and the RNA exosome subunit EXOSC6/Mtr3. In addition to revealing a transcriptional effect of 5-FU action via DNA binding regulators involved in cell cycle progression, our results have implications for the function of putative regulatory lncRNAs in 5-FU mediated cytotoxicity. The data raise the intriguing possibility that the drug deregulates lncRNAs/dsRNAs involved in controlling eukaryotic cell division, thereby highlighting a new class of promising therapeutical targets.
Collapse
Affiliation(s)
- Bingning Xie
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail)- UMR_S 1085, Rennes, France
| | - Emmanuelle Becker
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail)- UMR_S 1085, Rennes, France
- Univ Rennes, Inria, CNRS, IRISA F-35000, Rennes, France
| | - Igor Stuparevic
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail)- UMR_S 1085, Rennes, France
| | - Maxime Wery
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, PSL UniversityCNRS UMR 3244, Université Pierre et Marie Curie, Paris, France
| | - Ugo Szachnowski
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, PSL UniversityCNRS UMR 3244, Université Pierre et Marie Curie, Paris, France
| | - Antonin Morillon
- ncRNA, Epigenetic and Genome Fluidity, Institut Curie, PSL UniversityCNRS UMR 3244, Université Pierre et Marie Curie, Paris, France
| | - Michael Primig
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail)- UMR_S 1085, Rennes, France
| |
Collapse
|
6
|
Abstract
BACKGROUND Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. RESULTS To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. CONCLUSION We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes' contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.
Collapse
Affiliation(s)
- Rui Xie
- Department of Computer Science, University of Missouri at Columbia, Columbia, MO USA
| | - Jia Wen
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, University City Blvd, Charlotte, NC USA
| | - Andrew Quitadamo
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, University City Blvd, Charlotte, NC USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri at Columbia, Columbia, MO USA
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, University City Blvd, Charlotte, NC USA
| |
Collapse
|
7
|
Sun W, Kechris K, Jacobson S, Drummond MB, Hawkins GA, Yang J, Chen TH, Quibrera PM, Anderson W, Barr RG, Basta PV, Bleecker ER, Beaty T, Casaburi R, Castaldi P, Cho MH, Comellas A, Crapo JD, Criner G, Demeo D, Christenson SA, Couper DJ, Curtis JL, Doerschuk CM, Freeman CM, Gouskova NA, Han MK, Hanania NA, Hansel NN, Hersh CP, Hoffman EA, Kaner RJ, Kanner RE, Kleerup EC, Lutz S, Martinez FJ, Meyers DA, Peters SP, Regan EA, Rennard SI, Scholand MB, Silverman EK, Woodruff PG, O’Neal WK, Bowler RP. Common Genetic Polymorphisms Influence Blood Biomarker Measurements in COPD. PLoS Genet 2016; 12:e1006011. [PMID: 27532455 PMCID: PMC4988780 DOI: 10.1371/journal.pgen.1006011] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 04/05/2016] [Indexed: 12/20/2022] Open
Abstract
Implementing precision medicine for complex diseases such as chronic obstructive lung disease (COPD) will require extensive use of biomarkers and an in-depth understanding of how genetic, epigenetic, and environmental variations contribute to phenotypic diversity and disease progression. A meta-analysis from two large cohorts of current and former smokers with and without COPD [SPIROMICS (N = 750); COPDGene (N = 590)] was used to identify single nucleotide polymorphisms (SNPs) associated with measurement of 88 blood proteins (protein quantitative trait loci; pQTLs). PQTLs consistently replicated between the two cohorts. Features of pQTLs were compared to previously reported expression QTLs (eQTLs). Inference of causal relations of pQTL genotypes, biomarker measurements, and four clinical COPD phenotypes (airflow obstruction, emphysema, exacerbation history, and chronic bronchitis) were explored using conditional independence tests. We identified 527 highly significant (p < 8 X 10-10) pQTLs in 38 (43%) of blood proteins tested. Most pQTL SNPs were novel with low overlap to eQTL SNPs. The pQTL SNPs explained >10% of measured variation in 13 protein biomarkers, with a single SNP (rs7041; p = 10-392) explaining 71%-75% of the measured variation in vitamin D binding protein (gene = GC). Some of these pQTLs [e.g., pQTLs for VDBP, sRAGE (gene = AGER), surfactant protein D (gene = SFTPD), and TNFRSF10C] have been previously associated with COPD phenotypes. Most pQTLs were local (cis), but distant (trans) pQTL SNPs in the ABO blood group locus were the top pQTL SNPs for five proteins. The inclusion of pQTL SNPs improved the clinical predictive value for the established association of sRAGE and emphysema, and the explanation of variance (R2) for emphysema improved from 0.3 to 0.4 when the pQTL SNP was included in the model along with clinical covariates. Causal modeling provided insight into specific pQTL-disease relationships for airflow obstruction and emphysema. In conclusion, given the frequency of highly significant local pQTLs, the large amount of variance potentially explained by pQTL, and the differences observed between pQTLs and eQTLs SNPs, we recommend that protein biomarker-disease association studies take into account the potential effect of common local SNPs and that pQTLs be integrated along with eQTLs to uncover disease mechanisms. Large-scale blood biomarker studies would also benefit from close attention to the ABO blood group.
Collapse
Affiliation(s)
- Wei Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Sean Jacobson
- National Jewish Health, Denver, Colorado, United States of America
| | - M. Bradley Drummond
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Gregory A. Hawkins
- Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Jenny Yang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Ting-huei Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Pedro Miguel Quibrera
- Collaborative Studies Coordinating Center, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Wayne Anderson
- Marsico Lung Institute/Cystic Fibrosis Research Center, Department of Medicine, Division of Pulmonary and Critical Care Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina United States of America
| | - R. Graham Barr
- Department of Medicine, Columbia University Medical Center, New York, New York; Department of Epidemiology, Mailman School of Public Health at Columbia University, New York, New York, United States of America
| | - Patricia V. Basta
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Eugene R. Bleecker
- Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Terri Beaty
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University,Baltimore, Maryland, United States of America
| | - Richard Casaburi
- Division of Respiratory and Critical Care Physiology and Medicine, Harbor- University of California at Los Angeles Medical Center, Torrance, California, United States of America
| | - Peter Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Michael H. Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Alejandro Comellas
- Division of Pulmonary and Critical Care Medicine, University of Iowa, Iowa City, Iowa, United States of America
| | - James D. Crapo
- Department of Medicine, Division of Pulmonary, Critical Care and Sleep Medicine, National Jewish Health, Denver, Colorado, United States of America
| | - Gerard Criner
- Department of Thoracic Medicine and Surgery, Lewis Katz School of Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Dawn Demeo
- Division of Pulmonary and Critical Care Medicine, Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Stephanie A. Christenson
- Division of Pulmonary, Critical Care, Allergy, and Sleep Medicine, Department of Medicine, University of San Francisco Medical Center, University of California San Francisco, San Francisco, California, United States of America
| | - David J. Couper
- Collaborative Studies Coordinating Center, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Jeffrey L. Curtis
- Division of Pulmonary and Critical Care Medicine, University of Michigan Health System, Ann Arbor, Michigan; VA Ann Arbor Healthcare System, Ann Arbor, Michigan, United States of America
| | - Claire M. Doerschuk
- Marsico Lung Institute/Cystic Fibrosis Research Center, Department of Medicine, Division of Pulmonary and Critical Care Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina United States of America
| | - Christine M. Freeman
- Division of Pulmonary and Critical Care Medicine, University of Michigan Health System, Ann Arbor, Michigan; VA Ann Arbor Healthcare System, Ann Arbor, Michigan, United States of America
| | - Natalia A. Gouskova
- Collaborative Studies Coordinating Center, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - MeiLan K. Han
- Division of Pulmonary and Critical Care Medicine, University of Michigan Health System, Ann Arbor, Michigan, United States of America
| | - Nicola A. Hanania
- Section of Pulmonary and Critical Care Medicine, Baylor College of Medicine, Houston, Texas, United States of America
| | - Nadia N. Hansel
- Division of Pulmonary and Critical Care Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Craig P. Hersh
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Eric A. Hoffman
- Department of Radiology, Division of Physiologic Imaging, University of Iowa Hospitals and Clinics, Iowa City, Iowa, United States of America
| | - Robert J. Kaner
- Department of Genetic Medicine, Weill Cornell Medical College, New York, New York, Department of Medicine, Division of Pulmonary and Critical Care Medicine, Weill Cornell Medical College, New York, New York, United States of America
| | - Richard E. Kanner
- Department of Internal Medicine, Division of Pulmonary and Critical Care Medicine, University of Utah, Salt Lake City, Utah, United States of America
| | - Eric C. Kleerup
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
| | - Sharon Lutz
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Fernando J. Martinez
- Department of Medicine, Weill Cornell Medical College, New York-Presbyterian Hospital/Weill Cornell Medical Center, New York, New York, United States of America
| | - Deborah A. Meyers
- Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Stephen P. Peters
- Department of Medicine, Division of Pulmonary, Critical Care, Allergy and Immunologic Medicine, Wake Forest University School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Elizabeth A. Regan
- Department of Medicine, National Jewish Health, Denver, Colorado United States of America
| | - Stephen I. Rennard
- Division of Pulmonary and Critical Care Medicine, University of Nebraska, Omaha, Nebraska, United States of America
| | - Mary Beth Scholand
- Department of Internal Medicine, Division of Pulmonary and Critical Care Medicine, University of Utah, Salt Lake City, Utah, United States of America
| | - Edwin K. Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Prescott G. Woodruff
- Division of Pulmonary, Critical Care, Sleep and Allergy, Department of Medicine and Cardiovascular Research Institute, University of California San Francisco School of Medicine, San Francisco, California, United States of America
| | - Wanda K. O’Neal
- Marsico Lung Institute/Cystic Fibrosis Research Center, Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina United States of America
| | - Russell P. Bowler
- Department of Medicine, Division of Pulmonary Medicine, National Jewish Health, Denver, Colorado, United States of America
| | | | | |
Collapse
|
8
|
Richardson S, Tseng GC, Sun W. Statistical Methods in Integrative Genomics. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2016; 3:181-209. [PMID: 27482531 PMCID: PMC4963036 DOI: 10.1146/annurev-statistics-041715-033506] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions.
Collapse
Affiliation(s)
- Sylvia Richardson
- MRC Biostatistics Unit, Cambridge Institute of Public Health, University of Cambridge, CB2 0SR, United Kingdom
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261
| | - Wei Sun
- Department of Biostatistics, Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 27516
| |
Collapse
|
9
|
Rashid NU, Sun W, Ibrahim JG. A STATISTICAL MODEL TO ASSESS (ALLELE-SPECIFIC) ASSOCIATIONS BETWEEN GENE EXPRESSION AND EPIGENETIC FEATURES USING SEQUENCING DATA. Ann Appl Stat 2016; 10:2254-2273. [PMID: 29034055 DOI: 10.1214/16-aoas973] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Sequencing techniques have been widely used to assess gene expression (i.e., RNA-seq) or the presence of epigenetic features (e.g., DNase-seq to identify open chromatin regions). In contrast to traditional microarray platforms, sequencing data are typically summarized in the form of discrete counts, and they are able to delineate allele-specific signals, which are not available from microarrays. The presence of epigenetic features are often associated with gene expression, both of which have been shown to be affected by DNA polymorphisms. However, joint models with the flexibility to assess interactions between gene expression, epigenetic features and DNA polymorphisms are currently lacking. In this paper, we develop a statistical model to assess the associations between gene expression and epigenetic features using sequencing data, while explicitly modeling the effects of DNA polymorphisms in either an allele-specific or nonallele-specific manner. We show that in doing so we provide the flexibility to detect associations between gene expression and epigenetic features, as well as conditional associations given DNA polymorphisms. We evaluate the performance of our method using simulations and apply our method to study the association between gene expression and the presence of DNase I Hypersensitive sites (DHSs) in HapMap individuals. Our model can be generalized to exploring the relationships between DNA polymorphisms and any two types of sequencing experiments, a useful feature as the variety of sequencing experiments continue to expand.
Collapse
Affiliation(s)
| | - Wei Sun
- Fred Hutchinson Cancer Research Center
| | | |
Collapse
|
10
|
Kavšček M, Stražar M, Curk T, Natter K, Petrovič U. Yeast as a cell factory: current state and perspectives. Microb Cell Fact 2015; 14:94. [PMID: 26122609 PMCID: PMC4486425 DOI: 10.1186/s12934-015-0281-x] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Accepted: 06/11/2015] [Indexed: 02/06/2023] Open
Abstract
The yeast Saccharomyces cerevisiae is one of the oldest and most frequently used microorganisms in biotechnology with successful applications in the production of both bulk and fine chemicals. Yet, yeast researchers are faced with the challenge to further its transition from the old workhorse to a modern cell factory, fulfilling the requirements for next generation bioprocesses. Many of the principles and tools that are applied for this development originate from the field of synthetic biology and the engineered strains will indeed be synthetic organisms. We provide an overview of the most important aspects of this transition and highlight achievements in recent years as well as trends in which yeast currently lags behind. These aspects include: the enhancement of the substrate spectrum of yeast, with the focus on the efficient utilization of renewable feedstocks, the enhancement of the product spectrum through generation of independent circuits for the maintenance of redox balances and biosynthesis of common carbon building blocks, the requirement for accurate pathway control with improved genome editing and through orthogonal promoters, and improvement of the tolerance of yeast for specific stress conditions. The causative genetic elements for the required traits of the future yeast cell factories will be assembled into genetic modules for fast transfer between strains. These developments will benefit from progress in bio-computational methods, which allow for the integration of different kinds of data sets and algorithms, and from rapid advancement in genome editing, which will enable multiplexed targeted integration of whole heterologous pathways. The overall goal will be to provide a collection of modules and circuits that work independently and can be combined at will, depending on the individual conditions, and will result in an optimal synthetic host for a given production process.
Collapse
Affiliation(s)
- Martin Kavšček
- Institute of Molecular Biosciences, University of Graz, Humboldtstrasse 50/II, 8010, Graz, Austria.
| | - Martin Stražar
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.
| | - Tomaž Curk
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.
| | - Klaus Natter
- Institute of Molecular Biosciences, University of Graz, Humboldtstrasse 50/II, 8010, Graz, Austria.
| | - Uroš Petrovič
- Department of Molecular and Biomedical Sciences, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia.
| |
Collapse
|
11
|
Ballini E, Lauter N, Wise R. Prospects for advancing defense to cereal rusts through genetical genomics. FRONTIERS IN PLANT SCIENCE 2013; 4:117. [PMID: 23641250 PMCID: PMC3640194 DOI: 10.3389/fpls.2013.00117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2013] [Accepted: 04/15/2013] [Indexed: 05/03/2023]
Abstract
Rusts are one of the most severe threats to cereal crops because new pathogen races emerge regularly, resulting in infestations that lead to large yield losses. In 1999, a new race of stem rust, Puccinia graminis f. sp. tritici (Pgt TTKSK or Ug99), was discovered in Uganda. Most of the wheat and barley cultivars grown currently worldwide are susceptible to this new race. Pgt TTKSK has already spread northward into Iran and will likely spread eastward throughout the Indian subcontinent in the near future. This scenario is not unique to stem rust; new races of leaf rust (Puccinia triticina) and stripe rust (Puccinia striiformis) have also emerged recently. One strategy for countering the persistent adaptability of these pathogens is to stack complete- and partial-resistance genes, which requires significant breeding efforts in order to reduce deleterious effects of linkage drag. These varied resistance combinations are typically more difficult for the pathogen to defeat, since they would be predicted to apply lower selection pressure. Genetical genomics or expression Quantitative Trait Locus (eQTL) analysis enables the identification of regulatory loci that control the expression of many to hundreds of genes. Integrated deployment of these technologies coupled with efficient phenotyping offers significant potential to elucidate the regulatory nodes in genetic networks that orchestrate host defense responses. The focus of this review will be to present advances in genetical genomic experimental designs and analysis, particularly as they apply to the prospects for discovering partial disease resistance alleles in cereals.
Collapse
Affiliation(s)
| | | | - Roger Wise
- Corn Insects and Crop Genetics Research, Department of Plant Pathology and Microbiology, US Department of Agriculture - Agricultural Research Service, Center for Plant Responses to Environmental Stresses, Iowa State UniversityAmes, IA, USA
| |
Collapse
|
12
|
Abstract
Current efforts in systems genetics have focused on the development of statistical approaches that aim to disentangle causal relationships among molecular phenotypes in segregating populations. Reverse engineering of transcriptional networks plays a key role in the understanding of gene regulation. However, transcriptional regulation is only one possible mechanism, as methylation, phosphorylation, direct protein-protein interaction, transcription factor binding, etc., can also contribute to gene regulation. These additional modes of regulation can be interpreted as unobserved variables in the transcriptional gene network and can potentially affect its reconstruction accuracy. We develop tests of causal direction for a pair of phenotypes that may be embedded in a more complicated but unobserved network by extending Vuong's selection tests for misspecified models. Our tests provide a significance level, which is unavailable for the widely used AIC and BIC criteria. We evaluate the performance of our tests against the AIC, BIC, and a recently published causality inference test in simulation studies. We compare the precision of causal calls using biologically validated causal relationships extracted from a database of 247 knockout experiments in yeast. Our model selection tests are more precise, showing greatly reduced false-positive rates compared to the alternative approaches. In practice, this is a useful feature since follow-up studies tend to be time consuming and expensive and, hence, it is important for the experimentalist to have causal predictions with low false-positive rates.
Collapse
|
13
|
Wright FA, Shabalin AA, Rusyn I. Computational tools for discovery and interpretation of expression quantitative trait loci. Pharmacogenomics 2012; 13:343-52. [PMID: 22304583 DOI: 10.2217/pgs.11.185] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Expression quantitative trait locus (eQTL) analysis is rapidly moving from a cutting-edge concept in genomics to a mature area of investigation, with important connections to genome-wide association studies for human disease, pharmacogenomics and toxicogenomics. Despite the importance of the topic, many investigators must develop their own code or use tools not specifically suited for eQTL analysis. Convenient computational tools are becoming available, but they are not widely publicized, and investigators who are interested in discovery or eQTL, or in using them to interpret genome-wide association study results may have difficulty navigating the available resources. The purpose of this review is to help investigators find appropriate programs for eQTL analysis and interpretation.
Collapse
Affiliation(s)
- Fred A Wright
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | | | | |
Collapse
|
14
|
Montgomery SB, Dermitzakis ET. From expression QTLs to personalized transcriptomics. Nat Rev Genet 2011; 12:277-82. [PMID: 21386863 DOI: 10.1038/nrg2969] [Citation(s) in RCA: 128] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Approaches that combine expression quantitative trait loci (eQTLs) and genome-wide association (GWA) studies are offering new functional information about the aetiology of complex human traits and diseases. Improved study designs--which take into account technological advances in resolving the transcriptome, cell history and state, population of origin and diverse endophenotypes--are providing insights into the architecture of disease and the landscape of gene regulation in humans. Furthermore, these advances are helping to establish links between cellular effects and organismal traits.
Collapse
Affiliation(s)
- Stephen B Montgomery
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1 rue Michel-Servet, Geneva 1211, Switzerland.
| | | |
Collapse
|
15
|
Parts L, Stegle O, Winn J, Durbin R. Joint genetic analysis of gene expression data with inferred cellular phenotypes. PLoS Genet 2011; 7:e1001276. [PMID: 21283789 PMCID: PMC3024309 DOI: 10.1371/journal.pgen.1001276] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2010] [Accepted: 12/14/2010] [Indexed: 12/01/2022] Open
Abstract
Even within a defined cell type, the expression level of a gene differs in individual samples. The effects of genotype, measured factors such as environmental conditions, and their interactions have been explored in recent studies. Methods have also been developed to identify unmeasured intermediate factors that coherently influence transcript levels of multiple genes. Here, we show how to bring these two approaches together and analyse genetic effects in the context of inferred determinants of gene expression. We use a sparse factor analysis model to infer hidden factors, which we treat as intermediate cellular phenotypes that in turn affect gene expression in a yeast dataset. We find that the inferred phenotypes are associated with locus genotypes and environmental conditions and can explain genetic associations to genes in trans. For the first time, we consider and find interactions between genotype and intermediate phenotypes inferred from gene expression levels, complementing and extending established results. The first step in transmitting heritable information, expressing RNA molecules, is highly regulated and depends on activations of specific pathways and regulatory factors. The state of the cell is hard to measure, making it difficult to understand what drives the changes in the gene expression. To close this gap, we apply a statistical model to infer the state of the cell, such as activations of transcription factors and molecular pathways, from gene expression data. We demonstrate how the inferred state helps to explain the effects of variation in the DNA and environment on the expression trait via both direct regulatory effects and interactions with the genetic state. Such analysis, exploiting inferred intermediate phenotypes, will aid understanding effects of genetic variability on global traits and will help to interpret the data from existing and forthcoming large scale studies.
Collapse
Affiliation(s)
- Leopold Parts
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
- * E-mail: (LP); (RD)
| | | | - John Winn
- Microsoft Research, Cambridge, United Kingdom
| | - Richard Durbin
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
- * E-mail: (LP); (RD)
| |
Collapse
|
16
|
Identifying the genetic determinants of transcription factor activity. Mol Syst Biol 2011; 6:412. [PMID: 20865005 PMCID: PMC2964119 DOI: 10.1038/msb.2010.64] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2010] [Accepted: 06/20/2010] [Indexed: 01/03/2023] Open
Abstract
Genome-wide messenger RNA expression levels are highly heritable. However, the molecular mechanisms underlying this heritability are poorly understood. The influence of trans-acting polymorphisms is often mediated by changes in the regulatory activity of one or more sequence-specific transcription factors (TFs). We use a method that exploits prior information about the DNA-binding specificity of each TF to estimate its genotype-specific regulatory activity. To this end, we perform linear regression of genotype-specific differential mRNA expression on TF-specific promoter-binding affinity. Treating inferred TF activity as a quantitative trait and mapping it across a panel of segregants from an experimental genetic cross allows us to identify trans-acting loci (‘aQTLs') whose allelic variation modulates the TF. A few of these aQTL regions contain the gene encoding the TF itself; several others contain a gene whose protein product is known to interact with the TF. Our method is strictly causal, as it only uses sequence-based features as predictors. Application to budding yeast demonstrates a dramatic increase in statistical power, compared with existing methods, to detect locus-TF associations and trans-acting loci. Our aQTL mapping strategy also succeeds in mouse.
Genetic sequence variation naturally perturbs mRNA expression levels in the cell. In recent years, analysis of parallel genotyping and expression profiling data for segregants from genetic crosses between parental strains has revealed that mRNA expression levels are highly heritable. Expression quantitative trait loci (eQTLs), whose allelic variation regulates the expression level of individual genes, have successfully been identified (Brem et al, 2002; Schadt et al, 2003). The molecular mechanisms underlying the heritability of mRNA expression are poorly understood. However, they are likely to involve mediation by transcription factors (TFs). We present a new transcription-factor-centric method that greatly increases our ability to understand what drives the genetic variation in mRNA expression (Figure 1). Our method identifies genomic loci (‘aQTLs') whose allelic variation modulates the protein-level activity of specific TFs. To map aQTLs, we integrate genotyping and expression profiling data with quantitative prior information about DNA-binding specificity of transcription factors in the form of position-specific affinity matrices (Bussemaker et al, 2007). We applied our method in two different organisms: budding yeast and mouse. In our approach, the inferred TF activity is explicitly treated as a quantitative trait, and genetically mapped. The decrease of ‘phenotype space' from that of all genes (in the eQTL approach) to that of all TFs (in our aQTL approach) increases the statistical power to detect trans-acting loci in two distinct ways. First, as each inferred TF activity is derived from a large number of genes, it is far less noisy than mRNA levels of individual genes. Second, the number of trait/marker combinations that needs to be tested for statistical significance in parallel is roughly two orders of magnitude smaller than for eQTLs. We identified a total of 103 locus-TF associations, a more than six-fold improvement over the 17 locus-TF associations identified by several existing methods (Brem et al, 2002; Yvert et al, 2003; Lee et al, 2006; Smith and Kruglyak, 2008; Zhu et al, 2008). The total number of distinct genomic loci identified as an aQTL equals 31, which includes 11 of the 13 previously identified eQTL hotspots (Smith and Kruglyak, 2008). To better understand the mechanisms underlying the identified genetic linkages, we examined the genes within each aQTL region. First, we found four ‘local' aQTLs, which encompass the gene encoding the TF itself. This includes the known polymorphism in the HAP1 gene (Brem et al, 2002), but also novel predictions of trans-acting polymorphisms in RFX1, STB5, and HAP4. Second, using high-throughput protein–protein interaction data, we identified putative causal genes for several aQTLs. For example, we predict that a polymorphism in the cyclin-dependent kinase CDC28 antagonistically modulates the functionally distinct cell cycle regulators Fkh1 and Fkh2. In this and other cases, our approach naturally accounts for post-translational modulation of TF activity at the protein level. We validated our ability to predict locus-TF associations in yeast using gene expression profiles of allele replacement strains from a previous study (Smith and Kruglyak, 2008). Chromosome 15 contains an aQTL whose allelic status influences the activity of no fewer than 30 distinct TFs. This locus includes IRA2, which controls intracellular cAMP levels. We used the gene expression profile of IRA2 replacement strains to confirm that the polymorphism within IRA2 indeed modulates a subset of the TFs whose activity was predicted to link to this locus, and no other TFs. Application of our approach to mouse data identified an aQTL modulating the activity of a specific TF in liver cells. We identified an aQTL on mouse chromosome 7 for Zscan4, a transcription factor containing four zinc finger domains and a SCAN domain. Even though we could not detect a candidate causal gene for Zscan4p because of lack of information about the mouse genome, our result demonstrates that our method also works in higher eukaryotes. In summary, aQTL mapping has a greatly improved sensitivity to detect molecular mechanisms underlying the heritability of gene expression. The successful application of our approach to yeast and mouse data underscores the value of explicitly treating the inferred TF activity as a quantitative trait for increasing statistical power of detecting trans-acting loci. Furthermore, our method is computationally efficient, and easily applicable to any other organism whenever prior information about the DNA-binding specificity of TFs is available. Analysis of parallel genotyping and expression profiling data has shown that mRNA expression levels are highly heritable. Currently, only a tiny fraction of this genetic variance can be mechanistically accounted for. The influence of trans-acting polymorphisms on gene expression traits is often mediated by transcription factors (TFs). We present a method that exploits prior knowledge about the in vitro DNA-binding specificity of a TF in order to map the loci (‘aQTLs') whose inheritance modulates its protein-level regulatory activity. Genome-wide regression of differential mRNA expression on predicted promoter affinity is used to estimate segregant-specific TF activity, which is subsequently mapped as a quantitative phenotype. In budding yeast, our method identifies six times as many locus-TF associations and more than twice as many trans-acting loci as all existing methods combined. Application to mouse data from an F2 intercross identified an aQTL on chromosome VII modulating the activity of Zscan4 in liver cells. Our method has greatly improved statistical power over existing methods, is mechanism based, strictly causal, computationally efficient, and generally applicable.
Collapse
|
17
|
High-confidence discovery of genetic network regulators in expression quantitative trait loci data. Genetics 2011; 187:955-64. [PMID: 21212238 DOI: 10.1534/genetics.110.124685] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Expression QTL (eQTL) studies involve the collection of microarray gene expression data and genetic marker data from segregating individuals in a population to search for genetic determinants of differential gene expression. Previous studies have found large numbers of trans-regulated genes (regulated by unlinked genetic loci) that link to a single locus or eQTL "hotspot," and it would be desirable to find the mechanism of coregulation for these gene groups. However, many difficulties exist with current network reconstruction algorithms such as low power and high computational cost. A common observation for biological networks is that they have a scale-free or power-law architecture. In such an architecture, highly influential nodes exist that have many connections to other nodes. If we assume that this type of architecture applies to genetic networks, then we can simplify the problem of genetic network reconstruction by focusing on discovery of the key regulatory genes at the top of the network. We introduce the concept of "shielding" in which a specific gene expression variable (the shielder) renders a set of other gene expression variables (the shielded genes) independent of the eQTL. We iteratively build networks from the eQTL to the shielder down using tests of conditional independence. We have proposed a novel test for controlling the shielder false-positive rate at a predetermined level by requiring a threshold number of shielded genes per shielder. Using simulation, we have demonstrated that we can control the shielder false-positive rate as well as obtain high shielder and edge specificity. In addition, we have shown our method to be robust to violation of the latent variable assumption, an important feature in the practical application of our method. We have applied our method to a yeast expression QTL data set in which microarray and marker data were collected from the progeny of a backcross of two species of Saccharomyces cerevisiae (Brem et al. 2002). Seven genetic networks have been discovered, and bioinformatic analysis of the discovered regulators and corresponding regulated genes has generated plausible hypotheses for mechanisms of regulation that can be tested in future experiments.
Collapse
|
18
|
Rusyn I, Gatti DM, Wiltshire T, Wilshire T, Kleeberger SR, Threadgill DW. Toxicogenetics: population-based testing of drug and chemical safety in mouse models. Pharmacogenomics 2010; 11:1127-36. [PMID: 20704464 DOI: 10.2217/pgs.10.100] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The rapid decline in the cost of dense genotyping is paving the way for new DNA sequence-based laboratory tests to move quickly into clinical practice, and to ultimately help realize the promise of 'personalized' therapies. These advances are based on the growing appreciation of genetics as an important dimension in science and the practice of investigative pharmacology and toxicology. On the clinical side, both the regulators and the pharmaceutical industry hope that the early identification of individuals prone to adverse drug effects will keep advantageous medicines on the market for the benefit of the vast majority of prospective patients. On the environmental health protection side, there is a clear need for better science to define the range and causes of susceptibility to adverse effects of chemicals in the population, so that the appropriate regulatory limits are established. In both cases, most of the research effort is focused on genome-wide association studies in humans where de novo genotyping of each subject is required. At the same time, the power of population-based preclinical safety testing in rodent models (e.g., mouse) remains to be fully exploited. Here, we highlight the approaches available to utilize the knowledge of DNA sequence and genetic diversity of the mouse as a species in mechanistic toxicology research. We posit that appropriate genetically defined mouse models may be combined with the limited data from human studies to not only discover the genetic determinants of susceptibility, but to also understand the molecular underpinnings of toxicity.
Collapse
Affiliation(s)
- Ivan Rusyn
- Department of Environmental Sciences & Engineering, 0031 Michael Hooker Research Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | | | | | | | | |
Collapse
|
19
|
Chen K, van Nimwegen E, Rajewsky N, Siegal ML. Correlating gene expression variation with cis-regulatory polymorphism in Saccharomyces cerevisiae. Genome Biol Evol 2010; 2:697-707. [PMID: 20829281 PMCID: PMC2953268 DOI: 10.1093/gbe/evq054] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Identifying the nucleotides that cause gene expression variation is a critical step in dissecting the genetic basis of complex traits. Here, we focus on polymorphisms that are predicted to alter transcription factor binding sites (TFBSs) in the yeast, Saccharomyces cerevisiae. We assembled a confident set of transcription factor motifs using recent protein binding microarray and ChIP-chip data and used our collection of motifs to predict a comprehensive set of TFBSs across the S. cerevisiae genome. We used a population genomics analysis to show that our predictions are accurate and significantly improve on our previous annotation. Although predicting gene expression from sequence is thought to be difficult in general, we identified a subset of genes for which changes in predicted TFBSs correlate well with expression divergence between yeast strains. Our analysis thus demonstrates both the accuracy of our new TFBS predictions and the feasibility of using simple models of gene regulation to causally link differences in gene expression to variation at individual nucleotides.
Collapse
Affiliation(s)
- Kevin Chen
- Center for Genomics and Systems Biology, Department of Biology, New York University
- Max-Delbrück-Centrum für Molekulare Medizin, Berlin-Buch, Germany
- Department of Genetics and BioMaPS Institute, Rutgers University
- Corresponding author: E-mail: ;
| | - Erik van Nimwegen
- Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Basel, Switzerland
| | | | - Mark L. Siegal
- Center for Genomics and Systems Biology, Department of Biology, New York University
- Corresponding author: E-mail: ;
| |
Collapse
|
20
|
Systems genetics, bioinformatics and eQTL mapping. Genetica 2010; 138:915-24. [DOI: 10.1007/s10709-010-9480-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2010] [Accepted: 07/30/2010] [Indexed: 12/15/2022]
|
21
|
Dietary fat-dependent transcriptional architecture and copy number alterations associated with modifiers of mammary cancer metastasis. Clin Exp Metastasis 2010; 27:279-93. [PMID: 20354763 DOI: 10.1007/s10585-010-9326-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2010] [Accepted: 03/17/2010] [Indexed: 01/01/2023]
Abstract
Breast cancer is a complex disease resulting from a combination of genetic and environmental factors. Among environmental factors, body composition and intake of specific dietary components like total fat are associated with increased incidence of breast cancer and metastasis. We previously showed that mice fed a high-fat diet have shorter mammary cancer latency, increased tumor growth and more pulmonary metastases than mice fed a standard diet. Subsequent genetic analysis identified several modifiers of metastatic mammary cancer along with widespread interactions between cancer modifiers and dietary fat. To elucidate diet-dependent genetic modifiers of mammary cancer and metastasis risk, global gene expression profiles and copy number alterations from mammary cancers were measured and expression quantitative trait loci (eQTL) identified. Functional candidate genes that colocalized with previously detected metastasis modifiers were identified. Additional analyses, such as eQTL by dietary fat interaction analysis, causality and database evaluations, helped to further refine the candidate loci to produce an enriched list of genes potentially involved in the pathogenesis of metastatic mammary cancer.
Collapse
|
22
|
Gat-Viks I, Meller R, Kupiec M, Shamir R. Understanding gene sequence variation in the context of transcription regulation in yeast. PLoS Genet 2010; 6:e1000800. [PMID: 20066030 PMCID: PMC2794365 DOI: 10.1371/journal.pgen.1000800] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2009] [Accepted: 12/07/2009] [Indexed: 11/18/2022] Open
Abstract
DNA sequence polymorphism in a regulatory protein can have a widespread transcriptional effect. Here we present a computational approach for analyzing modules of genes with a common regulation that are affected by specific DNA polymorphisms. We identify such regulatory-linkage modules by integrating genotypic and expression data for individuals in a segregating population with complementary expression data of strains mutated in a variety of regulatory proteins. Our procedure searches simultaneously for groups of co-expressed genes, for their common underlying linkage interval, and for their shared regulatory proteins. We applied the method to a cross between laboratory and wild strains of S. cerevisiae, demonstrating its ability to correctly suggest modules and to outperform extant approaches. Our results suggest that middle sporulation genes are under the control of polymorphism in the sporulation-specific tertiary complex Sum1p/Rfm1p/Hst1p. In another example, our analysis reveals novel inter-relations between Swi3 and two mitochondrial inner membrane proteins underlying variation in a module of aerobic cellular respiration genes. Overall, our findings demonstrate that this approach provides a useful framework for the systematic mapping of quantitative trait loci and their role in gene expression variation. High-throughput genotypic and expression data for individuals in a segregating population can provide important information regarding causal regulatory events. However, it has proven difficult to predict these regulatory relations, largely because of statistical power limitations. The use of additional available resources may increase the accuracy of predictions and suggest possible mechanisms through which the target genes are regulated. In this study, we combine genotypic and expression data across the segregating population with complementary regulatory information to identify modules of genes that are jointly affected by changes in activity of regulatory proteins, as well as by genotypic changes. We develop a novel approach called ReL analysis, which automatically learns such modules. A unique feature of our approach is that all three components of the module—the genes, the underlying polymorphism, and the regulatory proteins—are predicted simultaneously. The integrated analysis makes it possible to capture weaker linkage signals and suggests possible mechanisms underlying expression changes. We demonstrate the power of the method on data from yeast segregants, by identifying the roles of new as well as known polymorphisms.
Collapse
Affiliation(s)
- Irit Gat-Viks
- Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America.
| | | | | | | |
Collapse
|
23
|
Przytycka TM, Singh M, Slonim DK. Toward the dynamic interactome: it's about time. Brief Bioinform 2010; 11:15-29. [PMID: 20061351 PMCID: PMC2810115 DOI: 10.1093/bib/bbp057] [Citation(s) in RCA: 150] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Revised: 11/01/2009] [Indexed: 11/14/2022] Open
Abstract
Dynamic molecular interactions play a central role in regulating the functioning of cells and organisms. The availability of experimentally determined large-scale cellular networks, along with other high-throughput experimental data sets that provide snapshots of biological systems at different times and conditions, is increasingly helpful in elucidating interaction dynamics. Here we review the beginnings of a new subfield within computational biology, one focused on the global inference and analysis of the dynamic interactome. This burgeoning research area, which entails a shift from static to dynamic network analysis, promises to be a major step forward in our ability to model and reason about cellular function and behavior.
Collapse
Affiliation(s)
- Teresa M Przytycka
- National Center of Biotechnology Information, NLM, NIH, 8000 Rockville Pike, Bethesda MD 20814, USA.
| | | | | |
Collapse
|
24
|
Sun W, Wright FA, Tang Z, Nordgard SH, Van Loo P, Yu T, Kristensen VN, Perou CM. Integrated study of copy number states and genotype calls using high-density SNP arrays. Nucleic Acids Res 2009; 37:5365-77. [PMID: 19581427 PMCID: PMC2935461 DOI: 10.1093/nar/gkp493] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
We propose a statistical framework, named genoCN, to simultaneously dissect copy number states and genotypes using high-density SNP (single nucleotide polymorphism) arrays. There are at least two types of genomic DNA copy number differences: copy number variations (CNVs) and copy number aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared with CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. GenoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue. We evaluated genoCN by applications to 162 HapMap individuals and a brain tumor (glioblastoma) dataset and showed that our method can successfully identify both types of copy number differences and produce high-quality genotype calls.
Collapse
Affiliation(s)
- Wei Sun
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | |
Collapse
|
25
|
Ye C, Galbraith SJ, Liao JC, Eskin E. Using network component analysis to dissect regulatory networks mediated by transcription factors in yeast. PLoS Comput Biol 2009; 5:e1000311. [PMID: 19300475 PMCID: PMC2649002 DOI: 10.1371/journal.pcbi.1000311] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Accepted: 01/28/2009] [Indexed: 01/12/2023] Open
Abstract
Understanding the relationship between genetic variation and gene expression is a central question in genetics. With the availability of data from high-throughput technologies such as ChIP-Chip, expression, and genotyping arrays, we can begin to not only identify associations but to understand how genetic variations perturb the underlying transcription regulatory networks to induce differential gene expression. In this study, we describe a simple model of transcription regulation where the expression of a gene is completely characterized by two properties: the concentrations and promoter affinities of active transcription factors. We devise a method that extends Network Component Analysis (NCA) to determine how genetic variations in the form of single nucleotide polymorphisms (SNPs) perturb these two properties. Applying our method to a segregating population of Saccharomyces cerevisiae, we found statistically significant examples of trans-acting SNPs located in regulatory hotspots that perturb transcription factor concentrations and affinities for target promoters to cause global differential expression and cis-acting genetic variations that perturb the promoter affinities of transcription factors on a single gene to cause local differential expression. Although many genetic variations linked to gene expressions have been identified, it is not clear how they perturb the underlying regulatory networks that govern gene expression. Our work begins to fill this void by showing that many genetic variations affect the concentrations of active transcription factors in a cell and their affinities for target promoters. Understanding the effects of these perturbations can help us to paint a more complete picture of the complex landscape of transcription regulation. The software package implementing the algorithms discussed in this work is available as a MATLAB package upon request. One of the fundamental challenges in biology in the post-genomics era is understanding the complex regulatory mechanisms that govern how genes are turned on and off. In a single organism where the functions of individual genes in a population do not differ much, many of the differences between individuals including physical phenotypes, susceptibility to disease, and response to drugs can be attributed to how genes are regulated. Previous studies have largely focused on identifying regulator and target genes whose expressions are linked to genetic variations in a population. We present work that focuses on considering a specific set of regulators called transcription factors whose targets can be verified from experiments and whose interactions with those targets have been well studied and modeled. In this setting, we can begin to understand how genetic variations perturb the concentrations and promoter affinities of active transcription factors to induce differential expression of the targets. Understanding the effects of these perturbations is important to understanding the fundamental biology of gene regulation and can help us to design and assess therapeutics and treatments for complex diseases.
Collapse
Affiliation(s)
- Chun Ye
- Bioinformatics Program, University of California San Diego, La Jolla, California, United States of America.
| | | | | | | |
Collapse
|
26
|
Kliebenstein D. Quantitative genomics: analyzing intraspecific variation using global gene expression polymorphisms or eQTLs. ANNUAL REVIEW OF PLANT BIOLOGY 2009; 60:93-114. [PMID: 19012536 DOI: 10.1146/annurev.arplant.043008.092114] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Scientific inquiries in fields ranging from ecology to plant breeding assess phenotypic variation within a plant species either to explain its presence or utilize its consequences. Frequently this natural genetic variation is studied via mapping quantitative trait loci (QTLs); however, elucidation of the underlying molecular mechanisms is a continuing bottleneck. The genomic analysis of transcripts as individual phenotypes has led to the emerging field of expression QTL analysis. This field has begun both to delve into the ecological/evolutionary significance of this transcript variation as well as to use specific eQTLs to speed up our analysis of the molecular basis of quantitative traits. This review introduces eQTL analysis and begins to illustrate how these data can be applied to multiple research fields.
Collapse
Affiliation(s)
- Dan Kliebenstein
- Plant Sciences, University of California, Davis, California 95616, USA.
| |
Collapse
|
27
|
Hansen BG, Halkier BA, Kliebenstein DJ. Identifying the molecular basis of QTLs: eQTLs add a new dimension. TRENDS IN PLANT SCIENCE 2008; 13:72-7. [PMID: 18262820 DOI: 10.1016/j.tplants.2007.11.008] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2007] [Revised: 11/16/2007] [Accepted: 11/26/2007] [Indexed: 05/20/2023]
Abstract
Natural genetic variation within plant species is at the core of plant science ranging from agriculture to evolution. Whereas much progress has been made in mapping quantitative trait loci (QTLs) controlling this natural variation, the elucidation of the underlying molecular mechanisms has remained a bottleneck. Recent systems biology tools have significantly shortened the time required to proceed from a mapped locus to testing of candidate genes. These tools enable research on natural variation to move from simple reductionistic studies focused on individual genes to integrative studies connecting molecular variation at multiple loci with physiological consequences. This review focuses on recent examples that demonstrate how expression QTL data can be used for gene discovery and exploited to untangle complex regulatory networks.
Collapse
Affiliation(s)
- Bjarne G Hansen
- Department of Plant Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | |
Collapse
|