101
|
He Y, Huang L, Tang Y, Yang Z, Han Z. Genome-wide Identification and Analysis of Splicing QTLs in Multiple Sclerosis by RNA-Seq Data. Front Genet 2021; 12:769804. [PMID: 34868258 PMCID: PMC8633104 DOI: 10.3389/fgene.2021.769804] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 10/18/2021] [Indexed: 12/21/2022] Open
Abstract
Multiple sclerosis (MS) is an autoimmune disease characterized by inflammatory demyelinating lesions in the central nervous system. Recently, the dysregulation of alternative splicing (AS) in the brain has been found to significantly influence the progression of MS. Moreover, previous studies demonstrate that many MS-related variants in the genome act as the important regulation factors of AS events and contribute to the pathogenesis of MS. However, by far, no genome-wide research about the effect of genomic variants on AS events in MS has been reported. Here, we first implemented a strategy to obtain genomic variant genotype and AS isoform average percentage spliced-in values from RNA-seq data of 142 individuals (51 MS patients and 91 controls). Then, combing the two sets of data, we performed a cis-splicing quantitative trait loci (sQTLs) analysis to identify the cis-acting loci and the affected differential AS events in MS and further explored the characteristics of these cis-sQTLs. Finally, the weighted gene coexpression network and gene set enrichment analyses were used to investigate gene interaction pattern and functions of the affected AS events in MS. In total, we identified 5835 variants affecting 672 differential AS events. The cis-sQTLs tend to be distributed in proximity of the gene transcription initiation site, and the intronic variants of them are more capable of regulating AS events. The retained intron AS events are more susceptible to influence of genome variants, and their functions are involved in protein kinase and phosphorylation modification. In summary, these findings provide an insight into the mechanism of MS.
Collapse
Affiliation(s)
| | | | | | | | - Zhijie Han
- Department of Bioinformatics, School of Basic Medicine, Chongqing Medical University, Chongqing, China
| |
Collapse
|
102
|
Wang H, Huang B, Wang J. Predict long-range enhancer regulation based on protein-protein interactions between transcription factors. Nucleic Acids Res 2021; 49:10347-10368. [PMID: 34570239 PMCID: PMC8501976 DOI: 10.1093/nar/gkab841] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 08/10/2021] [Accepted: 09/10/2021] [Indexed: 12/18/2022] Open
Abstract
Long-range regulation by distal enhancers plays critical roles in cell-type specific transcriptional programs. Computational predictions of genome-wide enhancer-promoter interactions are still challenging due to limited accuracy and the lack of knowledge on the molecular mechanisms. Based on recent biological investigations, the protein-protein interactions (PPIs) between transcription factors (TFs) have been found to participate in the regulation of chromatin loops. Therefore, we developed a novel predictive model for cell-type specific enhancer-promoter interactions by leveraging the information of TF PPI signatures. Evaluated by a series of rigorous performance comparisons, the new model achieves superior performance over other methods. The model also identifies specific TF PPIs that may mediate long-range regulatory interactions, revealing new mechanistic understandings of enhancer regulation. The prioritized TF PPIs are associated with genes in distinct biological pathways, and the predicted enhancer-promoter interactions are strongly enriched with cis-eQTLs. Most interestingly, the model discovers enhancer-mediated trans-regulatory links between TFs and genes, which are significantly enriched with trans-eQTLs. The new predictive model, along with the genome-wide analyses, provides a platform to systematically delineate the complex interplay among TFs, enhancers and genes in long-range regulation. The novel predictions also lead to mechanistic interpretations of eQTLs to decode the genetic associations with gene expression.
Collapse
Affiliation(s)
- Hao Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, 428 S. Shaw Ln., East Lansing, MI 48824, USA
| | - Binbin Huang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, 428 S. Shaw Ln., East Lansing, MI 48824, USA
| | - Jianrong Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, 428 S. Shaw Ln., East Lansing, MI 48824, USA
| |
Collapse
|
103
|
Van Dyke K, Lutz S, Mekonnen G, Myers CL, Albert FW. Trans-acting genetic variation affects the expression of adjacent genes. Genetics 2021; 217:6126816. [PMID: 33789351 DOI: 10.1093/genetics/iyaa051] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 12/16/2020] [Indexed: 11/13/2022] Open
Abstract
Gene expression differences among individuals are shaped by trans-acting expression quantitative trait loci (eQTLs). Most trans-eQTLs map to hotspot locations that influence many genes. The molecular mechanisms perturbed by hotspots are often assumed to involve "vertical" cascades of effects in pathways that can ultimately affect the expression of thousands of genes. Here, we report that trans-eQTLs can affect the expression of adjacent genes via "horizontal" mechanisms that extend along a chromosome. Genes affected by trans-eQTL hotspots in the yeast Saccharomyces cerevisiae were more likely to be located next to each other than expected by chance. These paired hotspot effects tended to occur at adjacent genes that also show coexpression in response to genetic and environmental perturbations, suggesting shared mechanisms. Physical proximity and shared chromatin state, in addition to regulation of adjacent genes by similar transcription factors, were independently associated with paired hotspot effects among adjacent genes. Paired effects of trans-eQTLs can occur at neighboring genes even when these genes do not share a common function. This phenomenon could result in unexpected connections between regulatory genetic variation and phenotypes.
Collapse
Affiliation(s)
- Krisna Van Dyke
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Sheila Lutz
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Gemechu Mekonnen
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Frank W Albert
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
104
|
Miculan M, Nelissen H, Ben Hassen M, Marroni F, Inzé D, Pè ME, Dell’Acqua M. A forward genetics approach integrating genome-wide association study and expression quantitative trait locus mapping to dissect leaf development in maize (Zea mays). THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 107:1056-1071. [PMID: 34087008 PMCID: PMC8519057 DOI: 10.1111/tpj.15364] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 05/31/2021] [Indexed: 05/13/2023]
Abstract
The characterization of the genetic basis of maize (Zea mays) leaf development may support breeding efforts to obtain plants with higher vigor and productivity. In this study, a mapping panel of 197 biparental and multiparental maize recombinant inbred lines (RILs) was analyzed for multiple leaf traits at the seedling stage. RNA sequencing was used to estimate the transcription levels of 29 573 gene models in RILs and to derive 373 769 single nucleotide polymorphisms (SNPs), and a forward genetics approach combining these data was used to pinpoint candidate genes involved in leaf development. First, leaf traits were correlated with gene expression levels to identify transcript-trait correlations. Then, leaf traits were associated with SNPs in a genome-wide association (GWA) study. An expression quantitative trait locus mapping approach was followed to associate SNPs with gene expression levels, prioritizing candidate genes identified based on transcript-trait correlations and GWAs. Finally, a network analysis was conducted to cluster all transcripts in 38 co-expression modules. By integrating forward genetics approaches, we identified 25 candidate genes highly enriched for specific functional categories, providing evidence supporting the role of vacuolar proton pumps, cell wall effectors, and vesicular traffic controllers in leaf growth. These results tackle the complexity of leaf trait determination and may support precision breeding in maize.
Collapse
Affiliation(s)
- Mara Miculan
- Institute of Life SciencesScuola Superiore Sant’AnnaPisa56127Italy
| | - Hilde Nelissen
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhent9052Belgium
- Center for Plant Systems Biology, VIBGhent9052Belgium
| | - Manel Ben Hassen
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhent9052Belgium
- Center for Plant Systems Biology, VIBGhent9052Belgium
| | - Fabio Marroni
- IGA Technology ServicesUdine33100Italy
- Department of Agricultural, FoodAT, Environmental and Animal Sciences (DI4A)University of UdineUdine33100Italy
| | - Dirk Inzé
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhent9052Belgium
- Center for Plant Systems Biology, VIBGhent9052Belgium
| | - Mario Enrico Pè
- Institute of Life SciencesScuola Superiore Sant’AnnaPisa56127Italy
| | | |
Collapse
|
105
|
Brandt M, Kim-Hellmuth S, Ziosi M, Gokden A, Wolman A, Lam N, Recinos Y, Daniloski Z, Morris JA, Hornung V, Schumacher J, Lappalainen T. An autoimmune disease risk variant: A trans master regulatory effect mediated by IRF1 under immune stimulation? PLoS Genet 2021; 17:e1009684. [PMID: 34314424 PMCID: PMC8345867 DOI: 10.1371/journal.pgen.1009684] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 08/06/2021] [Accepted: 06/25/2021] [Indexed: 12/12/2022] Open
Abstract
Functional mechanisms remain unknown for most genetic loci associated to complex human traits and diseases. In this study, we first mapped trans-eQTLs in a data set of primary monocytes stimulated with LPS, and discovered that a risk variant for autoimmune disease, rs17622517 in an intron of C5ORF56, affects the expression of the transcription factor IRF1 20 kb away. The cis-regulatory effect specific to IRF1 is active under early immune stimulus, with a large number of trans-eQTL effects across the genome under late LPS response. Using CRISPRi silencing, we showed that perturbation of the SNP locus downregulates IRF1 and causes widespread transcriptional effects. Genome editing by CRISPR had suggestive recapitulation of the LPS-specific trans-eQTL signal and lent support for the rs17622517 site being functional. Our results suggest that this common genetic variant affects inter-individual response to immune stimuli via regulation of IRF1. For this autoimmune GWAS locus, our work provides evidence of the functional variant, demonstrates a condition-specific enhancer effect, identifies IRF1 as the likely causal gene in cis, and indicates that overactivation of the downstream immune-related pathway may be the cellular mechanism increasing disease risk. This work not only provides rare experimental validation of a master-regulatory trans-eQTL, but also demonstrates the power of eQTL mapping to build mechanistic hypotheses amenable for experimental follow-up using the CRISPR toolkit. Although many genetic loci have been associated to disease, understanding how these variants impact molecular and cellular functions to impact disease risk have been challenging. Here, we first used blood cells from a large number of individuals and stimulated them in the laboratory with a proxy for bacterial infection. We identified that a genetic variant associated to autoimmune diseases also affects the expression of the nearby transcription factor IRF1 gene in early immune response, followed by expression change of other genes in late immune response. We then studied this effect in cell lines, using the CRISPR approach to silence the activity of the genomic element of this variant and cause mutations at that position. We found evidence that this autoimmune disease -associated variant is located in a genomic regulatory element that responds to immune stimulus and affects expression of IRF1 and a complex gene regulatory network. Thus, our characterization of genetic regulatory variation in the human population combined with experimental follow-up suggests a plausible, previously uncharacterized molecular mechanism that may underlie this genetic variant’s effect on immune disease risk.
Collapse
Affiliation(s)
- Margot Brandt
- New York Genome Center, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, New York, United States of America
| | - Sarah Kim-Hellmuth
- New York Genome Center, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany
- Dr. von Hauner Children’s Hospital, Department of Pediatrics, University Hospital LMU Munich, Munich, Germany
| | - Marcello Ziosi
- New York Genome Center, New York, New York, United States of America
| | - Alper Gokden
- New York Genome Center, New York, New York, United States of America
| | - Aaron Wolman
- New York Genome Center, New York, New York, United States of America
| | - Nora Lam
- New York Genome Center, New York, New York, United States of America
- Program of Pathobiology and Mechanisms of Disease, Columbia University, New York, New York, United States of America
| | - Yocelyn Recinos
- New York Genome Center, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, New York, United States of America
| | - Zharko Daniloski
- New York Genome Center, New York, New York, United States of America
- New York University, Department of Biology, New York, New York, United States of America
| | - John A. Morris
- New York Genome Center, New York, New York, United States of America
- New York University, Department of Biology, New York, New York, United States of America
| | - Veit Hornung
- Gene Center and Department of Biochemistry, Ludwig-Maximilians-Universität München, Munich, Germany
| | | | - Tuuli Lappalainen
- New York Genome Center, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
106
|
Ago Y, Asano S, Hashimoto H, Waschek JA. Probing the VIPR2 Microduplication Linkage to Schizophrenia in Animal and Cellular Models. Front Neurosci 2021; 15:717490. [PMID: 34366784 PMCID: PMC8339898 DOI: 10.3389/fnins.2021.717490] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 07/05/2021] [Indexed: 01/30/2023] Open
Abstract
Pituitary adenylate cyclase-activating polypeptide (PACAP, gene name ADCYAP1) is a multifunctional neuropeptide involved in brain development and synaptic plasticity. With respect to PACAP function, most attention has been given to that mediated by its specific receptor PAC1 (ADCYAP1R1). However, PACAP also binds tightly to the high affinity receptors for vasoactive intestinal peptide (VIP, VIP), called VPAC1 and VPAC2 (VIPR1 and VIPR2, respectively). Depending on innervation patterns, PACAP can thus interact physiologically with any of these receptors. VPAC2 receptors, the focus of this review, are known to have a pivotal role in regulating circadian rhythms and to affect multiple other processes in the brain, including those involved in fear cognition. Accumulating evidence in human genetics indicates that microduplications at 7q36.3, containing VIPR2 gene, are linked to schizophrenia and possibly autism spectrum disorder. Although detailed molecular mechanisms have not been fully elucidated, recent studies in animal models suggest that overactivation of the VPAC2 receptor disrupts cortical circuit maturation. The VIPR2 linkage can thus be potentially explained by inappropriate control of receptor signaling at a time when neural circuits involved in cognition and social behavior are being established. Alternatively, or in addition, VPAC2 receptor overactivity may disrupt ongoing synaptic plasticity during processes of learning and memory. Finally, in vitro data indicate that PACAP and VIP have differential activities on the maturation of neurons via their distinct signaling pathways. Thus perturbations in the balance of VPAC2, VPAC1, and PAC1 receptors and their ligands may have important consequences in brain development and plasticity.
Collapse
Affiliation(s)
- Yukio Ago
- Department of Cellular and Molecular Pharmacology, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Satoshi Asano
- Department of Cellular and Molecular Pharmacology, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Hitoshi Hashimoto
- Laboratory of Molecular Neuropharmacology, Graduate School of Pharmaceutical Sciences, Osaka University, Suita, Japan.,Molecular Research Center for Children's Mental Development, United Graduate School of Child Development, Osaka University, Kanazawa University, Hamamatsu University School of Medicine, Chiba University and University of Fukui, Suita, Japan.,Division of Bioscience, Institute for Datability Science, Osaka University, Suita, Japan.,Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Japan
| | - James A Waschek
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
107
|
Wen J, Xie M, Rowland B, Rosen JD, Sun Q, Chen J, Tapia AL, Qian H, Kowalski MH, Shan Y, Young KL, Graff M, Argos M, Avery CL, Bien SA, Buyske S, Yin J, Choquet H, Fornage M, Hodonsky CJ, Jorgenson E, Kooperberg C, Loos RJF, Liu Y, Moon JY, North KE, Rich SS, Rotter JI, Smith JA, Zhao W, Shang L, Wang T, Zhou X, Reiner AP, Raffield LM, Li Y. Transcriptome-Wide Association Study of Blood Cell Traits in African Ancestry and Hispanic/Latino Populations. Genes (Basel) 2021; 12:1049. [PMID: 34356065 PMCID: PMC8307403 DOI: 10.3390/genes12071049] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 06/29/2021] [Accepted: 07/02/2021] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Thousands of genetic variants have been associated with hematological traits, though target genes remain unknown at most loci. Moreover, limited analyses have been conducted in African ancestry and Hispanic/Latino populations; hematological trait associated variants more common in these populations have likely been missed. METHODS To derive gene expression prediction models, we used ancestry-stratified datasets from the Multi-Ethnic Study of Atherosclerosis (MESA, including n = 229 African American and n = 381 Hispanic/Latino participants, monocytes) and the Depression Genes and Networks study (DGN, n = 922 European ancestry participants, whole blood). We then performed a transcriptome-wide association study (TWAS) for platelet count, hemoglobin, hematocrit, and white blood cell count in African (n = 27,955) and Hispanic/Latino (n = 28,324) ancestry participants. RESULTS Our results revealed 24 suggestive signals (p < 1 × 10-4) that were conditionally distinct from known GWAS identified variants and successfully replicated these signals in European ancestry subjects from UK Biobank. We found modestly improved correlation of predicted and measured gene expression in an independent African American cohort (the Genetic Epidemiology Network of Arteriopathy (GENOA) study (n = 802), lymphoblastoid cell lines) using the larger DGN reference panel; however, some genes were well predicted using MESA but not DGN. CONCLUSIONS These analyses demonstrate the importance of performing TWAS and other genetic analyses across diverse populations and of balancing sample size and ancestry background matching when selecting a TWAS reference panel.
Collapse
Affiliation(s)
- Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA; (J.W.); (M.X.); (L.M.R.)
| | - Munan Xie
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA; (J.W.); (M.X.); (L.M.R.)
| | - Bryce Rowland
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; (B.R.); (J.D.R.); (Q.S.); (J.C.); (A.L.T.); (M.H.K.); (Y.S.)
| | - Jonathan D. Rosen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; (B.R.); (J.D.R.); (Q.S.); (J.C.); (A.L.T.); (M.H.K.); (Y.S.)
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; (B.R.); (J.D.R.); (Q.S.); (J.C.); (A.L.T.); (M.H.K.); (Y.S.)
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; (B.R.); (J.D.R.); (Q.S.); (J.C.); (A.L.T.); (M.H.K.); (Y.S.)
| | - Amanda L. Tapia
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; (B.R.); (J.D.R.); (Q.S.); (J.C.); (A.L.T.); (M.H.K.); (Y.S.)
| | - Huijun Qian
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA;
| | - Madeline H. Kowalski
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; (B.R.); (J.D.R.); (Q.S.); (J.C.); (A.L.T.); (M.H.K.); (Y.S.)
| | - Yue Shan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; (B.R.); (J.D.R.); (Q.S.); (J.C.); (A.L.T.); (M.H.K.); (Y.S.)
| | - Kristin L. Young
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA; (K.L.Y.); (M.G.); (C.L.A.); (K.E.N.)
| | - Marielisa Graff
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA; (K.L.Y.); (M.G.); (C.L.A.); (K.E.N.)
| | - Maria Argos
- Division of Epidemiology and Biostatistics, University of Illinois at Chicago, Chicago, IL 60612, USA;
| | - Christy L. Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA; (K.L.Y.); (M.G.); (C.L.A.); (K.E.N.)
| | - Stephanie A. Bien
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; (S.A.B.); (C.K.)
| | - Steve Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ 08854, USA;
| | - Jie Yin
- Division of Research, Kaiser Permanente Northern California, Oakland, CA 94612, USA; (J.Y.); (H.C.)
| | - Hélène Choquet
- Division of Research, Kaiser Permanente Northern California, Oakland, CA 94612, USA; (J.Y.); (H.C.)
| | - Myriam Fornage
- Institute of Molecular Medicine, McGovern Medical School, The University of Texas Health Science Center, Houston, TX 77030, USA;
| | - Chani J. Hodonsky
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; (C.J.H.); (S.S.R.)
| | | | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; (S.A.B.); (C.K.)
| | - Ruth J. F. Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA;
| | - Yongmei Liu
- Molecular Physiology Institute, Duke University, Durham, NC 27701, USA;
| | - Jee-Young Moon
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA; (J.-Y.M.); (T.W.)
| | - Kari E. North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA; (K.L.Y.); (M.G.); (C.L.A.); (K.E.N.)
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; (C.J.H.); (S.S.R.)
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA;
| | - Jennifer A. Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA; (J.A.S.); (W.Z.)
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA; (J.A.S.); (W.Z.)
| | - Lulu Shang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA; (L.S.); (X.Z.)
| | - Tao Wang
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA; (J.-Y.M.); (T.W.)
| | - Xiang Zhou
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA; (L.S.); (X.Z.)
| | - Alexander P. Reiner
- Department of Epidemiology, University of Washington, Seattle, WA 98195, USA;
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA; (J.W.); (M.X.); (L.M.R.)
| | - Yun Li
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA; (J.W.); (M.X.); (L.M.R.)
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; (B.R.); (J.D.R.); (Q.S.); (J.C.); (A.L.T.); (M.H.K.); (Y.S.)
| |
Collapse
|
108
|
Chen J, Zhang X, Yi F, Gao X, Song W, Zhao H, Lai J. MP3RNA-seq: Massively parallel 3' end RNA sequencing for high-throughput gene expression profiling and genotyping. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2021; 63:1227-1239. [PMID: 33559966 DOI: 10.1111/jipb.13077] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 02/02/2021] [Indexed: 05/26/2023]
Abstract
Transcriptome deep sequencing (RNA-seq) has become a routine method for global gene expression profiling. However, its application to large-scale experiments remains limited by cost and labor constraints. Here we describe a massively parallel 3' end RNA-seq (MP3RNA-seq) method that introduces unique sample barcodes during reverse transcription to permit sample pooling immediately following this initial step. MP3RNA-seq allows for handling of hundreds of samples in a single experiment, at a cost of about $6 per sample for library construction and sequencing. MP3RNA-seq is effective for not only high-throughput gene expression profiling, but also genotyping. To demonstrate its utility, we applied MP3RNA-seq to 477 double haploid lines of maize. We identified 19,429 genes expressed in at least 50% of the lines and 35,836 high-quality single nucleotide polymorphisms for genotyping analysis. Armed with these data, we performed expression and agronomic trait quantitative trait locus (QTL) mapping and identified 25,797 expression QTLs for 15,335 genes and 21 QTLs for plant height, ear height, and relative ear height. We conclude that MP3RNA-seq is highly reproducible, accurate, and sensitive for high-throughput gene expression profiling and genotyping, and should be generally applicable to most eukaryotic species.
Collapse
Affiliation(s)
- Jian Chen
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, 100193, China
| | - Xiangbo Zhang
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, 100193, China
| | - Fei Yi
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, 100193, China
| | - Xiang Gao
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, 100193, China
| | - Weibin Song
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, 100193, China
| | - Haiming Zhao
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, 100193, China
| | - Jinsheng Lai
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, 100193, China
- Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing, 100193, China
| |
Collapse
|
109
|
Jehl F, Degalez F, Bernard M, Lecerf F, Lagoutte L, Désert C, Coulée M, Bouchez O, Leroux S, Abasht B, Tixier-Boichard M, Bed'hom B, Burlot T, Gourichon D, Bardou P, Acloque H, Foissac S, Djebali S, Giuffra E, Zerjal T, Pitel F, Klopp C, Lagarrigue S. RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species. Front Genet 2021; 12:655707. [PMID: 34262593 PMCID: PMC8273700 DOI: 10.3389/fgene.2021.655707] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 06/01/2021] [Indexed: 12/19/2022] Open
Abstract
In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequencing is expensive and exome sequencing tools are unavailable. These SNPs detected in expressed regions can be used to characterize variants affecting protein functions, and to study cis-regulated genes by analyzing allele-specific expression (ASE) in the tissue of interest. However, gene expression can be highly variable, and filters for SNP detection using the popular GATK toolkit are not yet standardized, making SNP detection and genotype calling by RNA-seq a challenging endeavor. We compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue. We showed, in expressed regions, a RNA-seq precision of 91% (SNPs detected by RNA-seq and shared by DNA-seq) and we characterized the remaining 9% of SNPs. We then studied the genotype (GT) obtained by RNA-seq and the impact of two factors (GT call-rate and read number per GT) on the concordance of GT with DNA-seq; we proposed thresholds for them leading to a 95% concordance. Applying these thresholds to 767 multi-tissue RNA-seq of 382 birds of 11 chicken populations, we found 9.5 M SNPs in total, of which ∼550,000 SNPs per tissue and population with a reliable GT (call rate ≥ 50%) and among them, ∼340,000 with a MAF ≥ 10%. We showed that such RNA-seq data from one tissue can be used to (i) detect SNPs with a strong predicted impact on proteins, despite their scarcity in each population (16,307 SIFT deleterious missenses and 590 stop-gained), (ii) study, on a large scale, cis-regulations of gene expression, with ∼81% of protein-coding and 68% of long non-coding genes (TPM ≥ 1) that can be analyzed for ASE, and with ∼29% of them that were cis-regulated, and (iii) analyze population genetic using such SNPs located in expressed regions. This work shows that RNA-seq data can be used with good confidence to detect SNPs and associated GT within various populations and used them for different analyses as GTEx studies.
Collapse
Affiliation(s)
- Frédéric Jehl
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Fabien Degalez
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Maria Bernard
- INRAE, SIGENAE, Genotoul Bioinfo MIAT, Castanet-Tolosan, France.,INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | | | | | - Colette Désert
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Manon Coulée
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Olivier Bouchez
- INRAE, US 1426, GeT-PlaGe, Genotoul, Castanet-Tolosan, France
| | - Sophie Leroux
- INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
| | - Behnam Abasht
- Department of Animal and Food Sciences, University of Delaware, Newark, DE, United States
| | | | - Bertrand Bed'hom
- INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | | | | | - Philippe Bardou
- INRAE, SIGENAE, Genotoul Bioinfo MIAT, Castanet-Tolosan, France
| | - Hervé Acloque
- INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | - Sylvain Foissac
- INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
| | - Sarah Djebali
- INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
| | - Elisabetta Giuffra
- INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | - Tatiana Zerjal
- INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | - Frédérique Pitel
- INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
| | | | | |
Collapse
|
110
|
Wang J, Clay-Gilmour AI, Karaesmen E, Rizvi A, Zhu Q, Yan L, Preus L, Liu S, Wang Y, Griffiths E, Stram DO, Pooler L, Sheng X, Haiman C, Van Den Berg D, Webb A, Brock G, Spellman S, Pasquini M, McCarthy P, Allan J, Stölzel F, Onel K, Hahn T, Sucheston-Campbell LE. Genome-Wide Association Analyses Identify Variants in IRF4 Associated With Acute Myeloid Leukemia and Myelodysplastic Syndrome Susceptibility. Front Genet 2021; 12:554948. [PMID: 34220922 PMCID: PMC8248805 DOI: 10.3389/fgene.2021.554948] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 04/19/2021] [Indexed: 12/22/2022] Open
Abstract
The role of common genetic variation in susceptibility to acute myeloid leukemia (AML), and myelodysplastic syndrome (MDS), a group of rare clonal hematologic disorders characterized by dysplastic hematopoiesis and high mortality, remains unclear. We performed AML and MDS genome-wide association studies (GWAS) in the DISCOVeRY-BMT cohorts (2,309 cases and 2,814 controls). Association analysis based on subsets (ASSET) was used to conduct a summary statistics SNP-based analysis of MDS and AML subtypes. For each AML and MDS case and control we used PrediXcan to estimate the component of gene expression determined by their genetic profile and correlate this imputed gene expression level with risk of developing disease in a transcriptome-wide association study (TWAS). ASSET identified an increased risk for de novo AML and MDS (OR = 1.38, 95% CI, 1.26-1.51, Pmeta = 2.8 × 10-12) in patients carrying the T allele at s12203592 in Interferon Regulatory Factor 4 (IRF4), a transcription factor which regulates myeloid and lymphoid hematopoietic differentiation. Our TWAS analyses showed increased IRF4 gene expression is associated with increased risk of de novo AML and MDS (OR = 3.90, 95% CI, 2.36-6.44, Pmeta = 1.0 × 10-7). The identification of IRF4 by both GWAS and TWAS contributes valuable insight on the role of genetic variation in AML and MDS susceptibility.
Collapse
Affiliation(s)
- Junke Wang
- College of Pharmacy, The Ohio State University, Columbus, OH, United States
| | - Alyssa I. Clay-Gilmour
- Department of Epidemiology, Mayo Clinic, Rochester, MN, United States
- Department of Epidemiology & Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC, United States
| | - Ezgi Karaesmen
- College of Pharmacy, The Ohio State University, Columbus, OH, United States
| | - Abbas Rizvi
- College of Pharmacy, The Ohio State University, Columbus, OH, United States
| | - Qianqian Zhu
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| | - Li Yan
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| | - Leah Preus
- College of Pharmacy, The Ohio State University, Columbus, OH, United States
| | - Song Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| | - Yiwen Wang
- College of Pharmacy, The Ohio State University, Columbus, OH, United States
| | - Elizabeth Griffiths
- Department of Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| | - Daniel O. Stram
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, United States
| | - Loreall Pooler
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, United States
| | - Xin Sheng
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, United States
| | - Christopher Haiman
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, United States
| | - David Van Den Berg
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, United States
| | - Amy Webb
- Department on Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| | - Guy Brock
- Department on Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| | - Stephen Spellman
- Center for International Blood and Marrow Transplant Research, Minneapolis, MN, United States
| | - Marcelo Pasquini
- Center for International Blood and Marrow Transplant Research, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Philip McCarthy
- Department of Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| | - James Allan
- Northern Institute for Cancer Research, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Friedrich Stölzel
- Department of Internal Medicine I, University Hospital Carl Gustav Carus Dresden, Technical University Dresden, Dresden, Germany
| | - Kenan Onel
- Department of Pediatrics, Mount Sinai Medical Center, Miami Beach, NY, United States
| | - Theresa Hahn
- Department of Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| | - Lara E. Sucheston-Campbell
- College of Pharmacy, The Ohio State University, Columbus, OH, United States
- College of Veterinary Medicine, The Ohio State University, Columbus, OH, United States
| |
Collapse
|
111
|
Mao W, Rahimikollu J, Hausler R, Chikina M. DataRemix: a universal data transformation for optimal inference from gene expression datasets. Bioinformatics 2021; 37:984-991. [PMID: 32821903 DOI: 10.1093/bioinformatics/btaa745] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 08/01/2020] [Accepted: 08/17/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION RNA-seq technology provides unprecedented power in the assessment of the transcription abundance and can be used to perform a variety of downstream tasks such as inference of gene-correlation network and eQTL discovery. However, raw gene expression values have to be normalized for nuisance biological variation and technical covariates, and different normalization strategies can lead to dramatically different results in the downstream study. RESULTS We describe a generalization of singular value decomposition-based reconstruction for which the common techniques of whitening, rank-k approximation and removing the top k principal components are special cases. Our simple three-parameter transformation, DataRemix, can be tuned to reweigh the contribution of hidden factors and reveal otherwise hidden biological signals. In particular, we demonstrate that the method can effectively prioritize biological signals over noise without leveraging external dataset-specific knowledge, and can outperform normalization methods that make explicit use of known technical factors. We also show that DataRemix can be efficiently optimized via Thompson sampling approach, which makes it feasible for computationally expensive objectives such as eQTL analysis. Finally, we apply our method to the Religious Orders Study and Memory and Aging Project dataset, and we report what to our knowledge is the first replicable trans-eQTL effect in human brain. AVAILABILITYAND IMPLEMENTATION DataRemix is an R package which is freely available at GitHub (https://github.com/wgmao/DataRemix). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Weiguang Mao
- Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA 15260, USA.,Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Javad Rahimikollu
- Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA 15260, USA.,Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Ryan Hausler
- Department of Medicine, Division of Hematology/Oncology,, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Maria Chikina
- Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA 15260, USA.,Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| |
Collapse
|
112
|
Banerjee S, Simonetti FL, Detrois KE, Kaphle A, Mitra R, Nagial R, Söding J. Tejaas: reverse regression increases power for detecting trans-eQTLs. Genome Biol 2021; 22:142. [PMID: 33957961 PMCID: PMC8101255 DOI: 10.1186/s13059-021-02361-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 04/22/2021] [Indexed: 12/18/2022] Open
Abstract
Trans-acting expression quantitative trait loci (trans-eQTLs) account for ≥70% expression heritability and could therefore facilitate uncovering mechanisms underlying the origination of complex diseases. Identifying trans-eQTLs is challenging because of small effect sizes, tissue specificity, and a severe multiple-testing burden. Tejaas predicts trans-eQTLs by performing L2-regularized “reverse” multiple regression of each SNP on all genes, aggregating evidence from many small trans-effects while being unaffected by the strong expression correlations. Combined with a novel unsupervised k-nearest neighbor method to remove confounders, Tejaas predicts 18851 unique trans-eQTLs across 49 tissues from GTEx. They are enriched in open chromatin, enhancers, and other regulatory regions. Many overlap with disease-associated SNPs, pointing to tissue-specific transcriptional regulation mechanisms.
Collapse
Affiliation(s)
- Saikat Banerjee
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany.
| | - Franco L Simonetti
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany
| | - Kira E Detrois
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany.,Georg-August University, Göttingen, 37075, Germany
| | - Anubhav Kaphle
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany.,Georg-August University, Göttingen, 37075, Germany
| | | | | | - Johannes Söding
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, Göttingen, 37077, Germany. .,Campus-Institut Data Science (CIDAS), University of Göttingen, Göttingen, 37073, Germany. .,Cluster of Excellence "Multiscale Bioimaging" (MBExC), University of Göttingen, Göttingen, 37075, Germany.
| |
Collapse
|
113
|
Syreeni A, Sandholm N, Sidore C, Cucca F, Haukka J, Harjutsalo V, Groop PH. Genome-wide search for genes affecting the age at diagnosis of type 1 diabetes. J Intern Med 2021; 289:662-674. [PMID: 33179336 PMCID: PMC8247053 DOI: 10.1111/joim.13187] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 09/07/2020] [Accepted: 09/09/2020] [Indexed: 12/16/2022]
Abstract
BACKGROUND Type 1 diabetes (T1D) is an autoimmune disease affecting individuals in the early years of life. Although previous studies have identified genetic loci influencing T1D diagnosis age, these studies did not investigate the genome with high resolution. OBJECTIVE AND METHODS We performed a genome-wide meta-analysis for age at diagnosis with cohorts from Finland (Finnish Diabetic Nephropathy Study), the United Kingdom (UK Genetic Resource Investigating Diabetes) and Sardinia. Through SNP associations, transcriptome-wide association analysis linked T1D diagnosis age and gene expression. RESULTS We identified two chromosomal regions associated with T1D diagnosis age: multiple independent variants in the HLA region on chromosome 6 and a locus on chromosome 17q12. We performed gene-level association tests with transcriptome prediction models from two whole blood datasets, lymphocyte cell line, spleen, pancreas and small intestine tissues. Of the non-HLA genes, lower PNMT expression in whole blood, and higher IKZF3 and ZPBP2, and lower ORMDL3 and GSDMB transcription levels in multiple tissues were associated with lower T1D diagnosis age (FDR = 0.05). These genes lie on chr17q12 which is associated with T1D, other autoimmune diseases, and childhood asthma. Additionally, higher expression of PHF20L1, a gene not previously implicated in T1D, was associated with lower diagnosis age in lymphocytes, pancreas, and spleen. Altogether, the non-HLA associations were enriched in open chromatin in various blood cells, blood vessel tissues and foetal thymus tissue. CONCLUSION Multiple genes on chr17q12 and PHF20L1 on chr8 were associated with T1D diagnosis age and only further studies may elucidate the role of these genes for immunity and T1D onset.
Collapse
Affiliation(s)
- A Syreeni
- From the, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland.,Folkhälsan Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland.,Abdominal Center, Nephrology, University of Helsinki and Helsinki University Central Hospital, Helsinki, Finland
| | - N Sandholm
- From the, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland.,Folkhälsan Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland.,Abdominal Center, Nephrology, University of Helsinki and Helsinki University Central Hospital, Helsinki, Finland
| | - C Sidore
- Instituto di Ricerca Genetica e Biomedica, CNR, Monserrato, Italy
| | - F Cucca
- Instituto di Ricerca Genetica e Biomedica, CNR, Monserrato, Italy.,Dipartimento di Scienze Biomediche, Università degli Studi di Sassari, Sassari, Italy
| | - J Haukka
- From the, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland.,Folkhälsan Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland.,Abdominal Center, Nephrology, University of Helsinki and Helsinki University Central Hospital, Helsinki, Finland
| | - V Harjutsalo
- From the, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland.,Folkhälsan Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland.,Abdominal Center, Nephrology, University of Helsinki and Helsinki University Central Hospital, Helsinki, Finland.,National Institute for Health and Welfare, Helsinki, Finland
| | - P-H Groop
- From the, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland.,Folkhälsan Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland.,Abdominal Center, Nephrology, University of Helsinki and Helsinki University Central Hospital, Helsinki, Finland.,Department of Diabetes, Central Clinical School, Monash University, Melbourne, Victoria, Australia
| | | |
Collapse
|
114
|
Mu Z, Wei W, Fair B, Miao J, Zhu P, Li YI. The impact of cell type and context-dependent regulatory variants on human immune traits. Genome Biol 2021; 22:122. [PMID: 33926512 PMCID: PMC8082814 DOI: 10.1186/s13059-021-02334-x] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Accepted: 03/30/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The vast majority of trait-associated variants identified using genome-wide association studies (GWAS) are noncoding, and therefore assumed to impact gene regulation. However, the majority of trait-associated loci are unexplained by regulatory quantitative trait loci (QTLs). RESULTS We perform a comprehensive characterization of the putative mechanisms by which GWAS loci impact human immune traits. By harmonizing four major immune QTL studies, we identify 26,271 expression QTLs (eQTLs) and 23,121 splicing QTLs (sQTLs) spanning 18 immune cell types. Our colocalization analyses between QTLs and trait-associated loci from 72 GWAS reveals that genetic effects on RNA expression and splicing in immune cells colocalize with 40.4% of GWAS loci for immune-related traits, in many cases increasing the fraction of colocalized loci by two fold compared to previous studies. Notably, we find that the largest contributors of this increase are splicing QTLs, which colocalize on average with 14% of all GWAS loci that do not colocalize with eQTLs. By contrast, we find that cell type-specific eQTLs, and eQTLs with small effect sizes contribute very few new colocalizations. To investigate the 60% of GWAS loci that remain unexplained, we collect H3K27ac CUT&Tag data from rheumatoid arthritis and healthy controls, and find large-scale differences between immune cells from the different disease contexts, including at regions overlapping unexplained GWAS loci. CONCLUSION Altogether, our work supports RNA splicing as an important mediator of genetic effects on immune traits, and suggests that we must expand our study of regulatory processes in disease contexts to improve functional interpretation of as yet unexplained GWAS loci.
Collapse
Affiliation(s)
- Zepeng Mu
- Committee on Genetics, Genomics & Systems Biology, University of Chicago, Chicago, IL USA
| | - Wei Wei
- Department of Clinical Immunology, Xijing Hospital, Xi’an, China
- National Translational Science Center for Molecular Medicine, Xi’an, China
| | - Benjamin Fair
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL USA
| | - Jinlin Miao
- Department of Clinical Immunology, Xijing Hospital, Xi’an, China
- National Translational Science Center for Molecular Medicine, Xi’an, China
| | - Ping Zhu
- Department of Clinical Immunology, Xijing Hospital, Xi’an, China
- National Translational Science Center for Molecular Medicine, Xi’an, China
| | - Yang I. Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL USA
- Department of Human Genetics, Department of Medicine, University of Chicago, Chicago, IL USA
| |
Collapse
|
115
|
Fan Y, Zhu H, Song Y, Peng Q, Zhou X. Efficient and effective control of confounding in eQTL mapping studies through joint differential expression and Mendelian randomization analyses. Bioinformatics 2021; 37:296-302. [PMID: 32790868 PMCID: PMC8058772 DOI: 10.1093/bioinformatics/btaa715] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 07/09/2020] [Accepted: 08/06/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Identifying cis-acting genetic variants associated with gene expression levels-an analysis commonly referred to as expression quantitative trait loci (eQTLs) mapping-is an important first step toward understanding the genetic determinant of gene expression variation. Successful eQTL mapping requires effective control of confounding factors. A common method for confounding effects control in eQTL mapping studies is the probabilistic estimation of expression residual (PEER) analysis. PEER analysis extracts PEER factors to serve as surrogates for confounding factors, which is further included in the subsequent eQTL mapping analysis. However, it is computationally challenging to determine the optimal number of PEER factors used for eQTL mapping. In particular, the standard approach to determine the optimal number of PEER factors examines one number at a time and chooses a number that optimizes eQTLs discovery. Unfortunately, this standard approach involves multiple repetitive eQTL mapping procedures that are computationally expensive, restricting its use in large-scale eQTL mapping studies that being collected today. RESULTS Here, we present a simple and computationally scalable alternative, Effect size Correlation for COnfounding determination (ECCO), to determine the optimal number of PEER factors used for eQTL mapping studies. Instead of performing repetitive eQTL mapping, ECCO jointly applies differential expression analysis and Mendelian randomization analysis, leading to substantial computational savings. In simulations and real data applications, we show that ECCO identifies a similar number of PEER factors required for eQTL mapping analysis as the standard approach but is two orders of magnitude faster. The computational scalability of ECCO allows for optimized eQTL discovery across 48 GTEx tissues for the first time, yielding an overall 5.89% power gain on the number of eQTL harboring genes (eGenes) discovered as compared to the previous GTEx recommendation that does not attempt to determine tissue-specific optimal number of PEER factors. AVAILABILITYAND IMPLEMENTATION Our method is implemented in the ECCO software, which, along with its GTEx mapping results, is freely available at www.xzlab.org/software.html. All R scripts used in this study are also available at this site. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yue Fan
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, China.,Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Huanhuan Zhu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yanyi Song
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Qinke Peng
- Systems Engineering Institute, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
116
|
Hoffman GE, Roussos P. Dream: powerful differential expression analysis for repeated measures designs. Bioinformatics 2021; 37:192-201. [PMID: 32730587 DOI: 10.1093/bioinformatics/btaa687] [Citation(s) in RCA: 144] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 07/13/2020] [Accepted: 07/23/2020] [Indexed: 01/08/2023] Open
Abstract
SUMMARY Large-scale transcriptome studies with multiple samples per individual are widely used to study disease biology. Yet, current methods for differential expression are inadequate for cross-individual testing for these repeated measures designs. Most problematic, we observe across multiple datasets that current methods can give reproducible false-positive findings that are driven by genetic regulation of gene expression, yet are unrelated to the trait of interest. Here, we introduce a statistical software package, dream, that increases power, controls the false positive rate, enables multiple types of hypothesis tests, and integrates with standard workflows. In 12 analyses in 6 independent datasets, dream yields biological insight not found with existing software while addressing the issue of reproducible false-positive findings. AVAILABILITY AND IMPLEMENTATION Dream is available within the variancePartition Bioconductor package at http://bioconductor.org/packages/variancePartition. CONTACT gabriel.hoffman@mssm.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriel E Hoffman
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Panos Roussos
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Mental Illness Research, Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY 10468, USA
| |
Collapse
|
117
|
Patel N, Bush WS. Modeling transcriptional regulation using gene regulatory networks based on multi-omics data sources. BMC Bioinformatics 2021; 22:200. [PMID: 33874910 PMCID: PMC8056605 DOI: 10.1186/s12859-021-04126-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 04/09/2021] [Indexed: 11/17/2022] Open
Abstract
Background Transcriptional regulation is complex, requiring multiple cis (local) and trans acting mechanisms working in concert to drive gene expression, with disruption of these processes linked to multiple diseases. Previous computational attempts to understand the influence of regulatory mechanisms on gene expression have used prediction models containing input features derived from cis regulatory factors. However, local chromatin looping and trans-acting mechanisms are known to also influence transcriptional regulation, and their inclusion may improve model accuracy and interpretation. In this study, we create a general model of transcription factor influence on gene expression by incorporating both cis and trans gene regulatory features. Results We describe a computational framework to model gene expression for GM12878 and K562 cell lines. This framework weights the impact of transcription factor-based regulatory data using multi-omics gene regulatory networks to account for both cis and trans acting mechanisms, and measures of the local chromatin context. These prediction models perform significantly better compared to models containing cis-regulatory features alone. Models that additionally integrate long distance chromatin interactions (or chromatin looping) between distal transcription factor binding regions and gene promoters also show improved accuracy. As a demonstration of their utility, effect estimates from these models were used to weight cis-regulatory rare variants for sequence kernel association test analyses of gene expression. Conclusions Our models generate refined effect estimates for the influence of individual transcription factors on gene expression, allowing characterization of their roles across the genome. This work also provides a framework for integrating multiple data types into a single model of transcriptional regulation. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04126-3.
Collapse
Affiliation(s)
- Neel Patel
- Department of Nutrition, Case Western Reserve University, Cleveland, OH, USA.,Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - William S Bush
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA.
| |
Collapse
|
118
|
Watt S, Vasquez L, Walter K, Mann AL, Kundu K, Chen L, Sims Y, Ecker S, Burden F, Farrow S, Farr B, Iotchkova V, Elding H, Mead D, Tardaguila M, Ponstingl H, Richardson D, Datta A, Flicek P, Clarke L, Downes K, Pastinen T, Fraser P, Frontini M, Javierre BM, Spivakov M, Soranzo N. Genetic perturbation of PU.1 binding and chromatin looping at neutrophil enhancers associates with autoimmune disease. Nat Commun 2021; 12:2298. [PMID: 33863903 PMCID: PMC8052402 DOI: 10.1038/s41467-021-22548-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 03/17/2021] [Indexed: 02/06/2023] Open
Abstract
Neutrophils play fundamental roles in innate immune response, shape adaptive immunity, and are a potentially causal cell type underpinning genetic associations with immune system traits and diseases. Here, we profile the binding of myeloid master regulator PU.1 in primary neutrophils across nearly a hundred volunteers. We show that variants associated with differential PU.1 binding underlie genetically-driven differences in cell count and susceptibility to autoimmune and inflammatory diseases. We integrate these results with other multi-individual genomic readouts, revealing coordinated effects of PU.1 binding variants on the local chromatin state, enhancer-promoter contacts and downstream gene expression, and providing a functional interpretation for 27 genes underlying immune traits. Collectively, these results demonstrate the functional role of PU.1 and its target enhancers in neutrophil transcriptional control and immune disease susceptibility.
Collapse
Affiliation(s)
- Stephen Watt
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK
| | - Louella Vasquez
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK
| | - Klaudia Walter
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK
| | - Alice L Mann
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK
| | - Kousik Kundu
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK
- School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Lu Chen
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK
- School of Clinical Medicine, University of Cambridge, Cambridge, UK
- Key Laboratory of Birth Defects and Related Diseases of Women and Children, Department of Laboratory Medicine, West China Second University Hospital, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, China
| | - Ying Sims
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK
| | | | - Frances Burden
- Department of Haematology, University of Cambridge, Cambridge, UK
- National Health Service Blood and Transplant (NHSBT), Cambridge, UK
| | - Samantha Farrow
- Department of Haematology, University of Cambridge, Cambridge, UK
- National Health Service Blood and Transplant (NHSBT), Cambridge, UK
| | - Ben Farr
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK
| | - Valentina Iotchkova
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
- MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, UK
| | - Heather Elding
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK
| | - Daniel Mead
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK
| | - Manuel Tardaguila
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK
| | - Hannes Ponstingl
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK
| | - David Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Avik Datta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Kate Downes
- Department of Haematology, University of Cambridge, Cambridge, UK
- National Health Service Blood and Transplant (NHSBT), Cambridge, UK
| | - Tomi Pastinen
- Center for Pediatric Genomic Medicine, Children's Mercy, Kansas City, MO, USA
| | - Peter Fraser
- Nuclear Dynamics Programme, Babraham Institute, Cambridge, UK
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Mattia Frontini
- Department of Haematology, University of Cambridge, Cambridge, UK
- National Health Service Blood and Transplant (NHSBT), Cambridge, UK
- British Heart Foundation Centre of Excellence, Division of Cardiovascular Medicine, Addenbrooke's Hospital, Cambridge, UK
- Institute of Biomedical & Clinical Science, College of Medicine and Health, University of Exeter Medical School, RILD Building, Exeter, UK
| | - Biola-Maria Javierre
- Nuclear Dynamics Programme, Babraham Institute, Cambridge, UK.
- Josep Carreras Leukaemia Research Institute, Badalona, Barcelona, Spain.
| | - Mikhail Spivakov
- Nuclear Dynamics Programme, Babraham Institute, Cambridge, UK.
- Functional Gene Control Group, MRC London Institute of Medical Sciences (LMS), London, UK.
- Institute of Clinical Sciences, Imperial College Faculty of Medicine, London, UK.
| | - Nicole Soranzo
- Human Genetics, Wellcome Sanger Institute, Genome Campus, Hinxton, UK.
- School of Clinical Medicine, University of Cambridge, Cambridge, UK.
| |
Collapse
|
119
|
Bruscadin JJ, de Souza MM, de Oliveira KS, Rocha MIP, Afonso J, Cardoso TF, Zerlotini A, Coutinho LL, Niciura SCM, de Almeida Regitano LC. Muscle allele-specific expression QTLs may affect meat quality traits in Bos indicus. Sci Rep 2021; 11:7321. [PMID: 33795794 PMCID: PMC8016890 DOI: 10.1038/s41598-021-86782-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 03/17/2021] [Indexed: 02/01/2023] Open
Abstract
Single nucleotide polymorphisms (SNPs) located in transcript sequences showing allele-specific expression (ASE SNPs) were previously identified in the Longissimus thoracis muscle of a Nelore (Bos indicus) population consisting of 190 steers. Given that the allele-specific expression pattern may result from cis-regulatory SNPs, called allele-specific expression quantitative trait loci (aseQTLs), in this study, we searched for aseQTLs in a window of 1 Mb upstream and downstream from each ASE SNP. After this initial analysis, aiming to investigate variants with a potential regulatory role, we further screened our aseQTL data for sequence similarity with transcription factor binding sites and microRNA (miRNA) binding sites. These aseQTLs were overlapped with methylation data from reduced representation bisulfite sequencing (RRBS) obtained from 12 animals of the same population. We identified 1134 aseQTLs associated with 126 different ASE SNPs. For 215 aseQTLs, one allele potentially affected the affinity of a muscle-expressed transcription factor to its binding site. 162 aseQTLs were predicted to affect 149 miRNA binding sites, from which 114 miRNAs were expressed in muscle. Also, 16 aseQTLs were methylated in our population. Integration of aseQTL with GWAS data revealed enrichment for traits such as meat tenderness, ribeye area, and intramuscular fat . To our knowledge, this is the first report of aseQTLs identification in bovine muscle. Our findings indicate that various cis-regulatory and epigenetic mechanisms can affect multiple variants to modulate the allelic expression. Some of the potential regulatory variants described here were associated with the expression pattern of genes related to interesting phenotypes for livestock. Thus, these variants might be useful for the comprehension of the genetic control of these phenotypes.
Collapse
Affiliation(s)
- Jennifer Jessica Bruscadin
- grid.411247.50000 0001 2163 588XPost-Graduation Program of Evolutionary Genetics and Molecular Biology, Center of Biological Sciences and Health, Federal University of São Carlos, São Carlos, SP Brazil
| | - Marcela Maria de Souza
- grid.34421.300000 0004 1936 7312Post-Doctoral Fellow, Department of Animal Science, Iowa State University, Ames, IA USA
| | - Karina Santos de Oliveira
- grid.411247.50000 0001 2163 588XPost-Graduation Program of Evolutionary Genetics and Molecular Biology, Center of Biological Sciences and Health, Federal University of São Carlos, São Carlos, SP Brazil
| | - Marina Ibelli Pereira Rocha
- grid.411247.50000 0001 2163 588XPost-Graduation Program of Evolutionary Genetics and Molecular Biology, Center of Biological Sciences and Health, Federal University of São Carlos, São Carlos, SP Brazil
| | - Juliana Afonso
- grid.11899.380000 0004 1937 0722Department of Animal Science, University of São Paulo/ESALQ, Piracicaba, SP Brazil
| | - Tainã Figueiredo Cardoso
- grid.460200.00000 0004 0541 873XEmbrapa Pecuária Sudeste, P. O. Box 339, São Carlos, SP 13564-230 Brazil
| | - Adhemar Zerlotini
- grid.460200.00000 0004 0541 873XEmbrapa Informática Agropecuária, Campinas, SP Brazil
| | - Luiz Lehmann Coutinho
- grid.11899.380000 0004 1937 0722Department of Animal Science, University of São Paulo/ESALQ, Piracicaba, SP Brazil
| | | | | |
Collapse
|
120
|
A cell-specific regulatory region of the human ABO blood group gene regulates the neighborhood gene encoding odorant binding protein 2B. Sci Rep 2021; 11:7325. [PMID: 33795748 PMCID: PMC8016878 DOI: 10.1038/s41598-021-86843-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 03/22/2021] [Indexed: 01/27/2023] Open
Abstract
The human ABO blood group system is of great importance in blood transfusion and organ transplantation. ABO transcription is known to be regulated by a constitutive promoter in a CpG island and regions for regulation of cell-specific expression such as the downstream + 22.6-kb site for epithelial cells and a site in intron 1 for erythroid cells. Here we investigated whether the + 22.6-kb site might play a role in transcriptional regulation of the gene encoding odorant binding protein 2B (OBP2B), which is located on the centromere side 43.4 kb from the + 22.6-kb site. In the gastric cancer cell line KATOIII, quantitative PCR analysis demonstrated significantly reduced amounts of OBP2B and ABO transcripts in mutant cells with biallelic deletions of the site created using the CRISPR/Cas9 system, relative to those in the wild-type cells, and Western blotting demonstrated a corresponding reduction of OBP2B protein in the mutant cells. Moreover, single-molecule fluorescence in situ hybridization assays indicated that the amounts of both transcripts were correlated in individual cells. These findings suggest that OBP2B could be co-regulated by the + 22.6-kb site of ABO.
Collapse
|
121
|
Schwartzentruber J, Cooper S, Liu JZ, Barrio-Hernandez I, Bello E, Kumasaka N, Young AMH, Franklin RJM, Johnson T, Estrada K, Gaffney DJ, Beltrao P, Bassett A. Genome-wide meta-analysis, fine-mapping and integrative prioritization implicate new Alzheimer's disease risk genes. Nat Genet 2021; 53:392-402. [PMID: 33589840 PMCID: PMC7610386 DOI: 10.1038/s41588-020-00776-w] [Citation(s) in RCA: 312] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 12/23/2020] [Indexed: 01/30/2023]
Abstract
Genome-wide association studies have discovered numerous genomic loci associated with Alzheimer's disease (AD); yet the causal genes and variants are incompletely identified. We performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2. Using three SNP-level fine-mapping methods, we identified 21 SNPs with >50% probability each of being causally involved in AD risk and others strongly suggested by functional annotation. We followed this with colocalization analyses across 109 gene expression quantitative trait loci datasets and prioritization of genes by using protein interaction networks and tissue-specific expression. Combining this information into a quantitative score, we found that evidence converged on likely causal genes, including the above four genes, and those at previously discovered AD loci, including BIN1, APH1B, PTK2B, PILRA and CASS4.
Collapse
Affiliation(s)
- Jeremy Schwartzentruber
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- Open Targets, Wellcome Genome Campus, Cambridge, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| | - Sarah Cooper
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | | | - Inigo Barrio-Hernandez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Open Targets, Wellcome Genome Campus, Cambridge, UK
| | - Erica Bello
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | | | - Adam M H Young
- Wellcome-Medical Research Council Cambridge Stem Cell Institute, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
| | - Robin J M Franklin
- Wellcome-Medical Research Council Cambridge Stem Cell Institute, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
| | - Toby Johnson
- Target Sciences-R&D, GSK Medicines Research Centre, Stevenage, UK
| | | | - Daniel J Gaffney
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Genomics Plc, Oxford, UK
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Open Targets, Wellcome Genome Campus, Cambridge, UK
| | - Andrew Bassett
- Open Targets, Wellcome Genome Campus, Cambridge, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
122
|
Yang F, Gleason KJ, Wang J, Duan J, He X, Pierce BL, Chen LS. CCmed: Cross-condition mediation analysis for identifying replicable trans-associations mediated by cis-gene expression. Bioinformatics 2021; 37:2513-2520. [PMID: 33647928 PMCID: PMC8428610 DOI: 10.1093/bioinformatics/btab139] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 02/25/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Trans-acting expression quantitative trait loci (eQTLs) collectively explain a substantial proportion of expression variation, yet are challenging to detect and replicate since their effects are often individually weak. A large proportion of genetic effects on distal genes are mediated through cisgene expression. Cis-association (between SNP and cis-gene) and gene-gene correlation conditional on SNP genotype could establish trans-association (between SNP and trans-gene). Both cis-association and gene-gene conditional correlation have effects shared across relevant tissues and conditions, and transassociations mediated by cis-gene expression also have effects shared across relevant conditions. RESULTS . We proposed a Cross-Condition Mediation analysis method (CCmed) for detecting cis-mediated trans-associations with replicable effects in relevant conditions/studies. CCmed integrates cis-association and gene-gene conditional correlation statistics from multiple tissues/studies. Motivated by the bimodal effect-sharing patterns of eQTLs, we proposed two variations of CCmed, CCmedmost and CCmedspec for detecting cross-tissue and tissue-specific trans-associations, respectively. We analyzed data of 13 brain tissues from the Genotype-Tissue Expression (GTEx) project, and identified trios with cis-mediated transassociations across brain tissues, many of which showed evidence of trans-association in two replication studies. We also identified trans-genes associated with schizophrenia loci in at least two brain tissues. AVAILABILITY AND IMPLEMENTATION CCmed software is available at http://github.com/kjgleason/CCmed. SUPPLEMENTARY INFORMATION Supplementary Material are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fan Yang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, 13001 E. 17th Place, Aurora, Colorado, 80045, USA
| | - Kevin J Gleason
- Department of Public Health Sciences, University of Chicago, 5841 South Maryland Ave MC2000, Chicago, IL, 60637, USA
| | - Jiebiao Wang
- Department of Biostatistics, University of Pittsburgh, 7135 Public Health, 130 DeSoto Street, Pittsburgh, PA, 15261, USA
| | - Jubao Duan
- Center for Psychiatric Genetics, NorthShore University HealthSystem, 1001 University Place, Evanston, IL, 60201, USA.,Department of Psychiatry and Behavioral Neuroscience, 5841 S Maryland Ave, Chicago MC3077, Chicago, IL, 60637, USA
| | - Xin He
- Heart Center, Turku University Hospital and University of Turku, Turku, Finland
| | - Brandon L Pierce
- Department of Public Health Sciences, University of Chicago, 5841 South Maryland Ave MC2000, Chicago, IL, 60637, USA.,Heart Center, Turku University Hospital and University of Turku, Turku, Finland
| | - Lin S Chen
- Department of Public Health Sciences, University of Chicago, 5841 South Maryland Ave MC2000, Chicago, IL, 60637, USA
| |
Collapse
|
123
|
Ward MC, Banovich NE, Sarkar A, Stephens M, Gilad Y. Dynamic effects of genetic variation on gene expression revealed following hypoxic stress in cardiomyocytes. eLife 2021; 10:e57345. [PMID: 33554857 PMCID: PMC7906610 DOI: 10.7554/elife.57345] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Accepted: 02/06/2021] [Indexed: 12/13/2022] Open
Abstract
One life-threatening outcome of cardiovascular disease is myocardial infarction, where cardiomyocytes are deprived of oxygen. To study inter-individual differences in response to hypoxia, we established an in vitro model of induced pluripotent stem cell-derived cardiomyocytes from 15 individuals. We measured gene expression levels, chromatin accessibility, and methylation levels in four culturing conditions that correspond to normoxia, hypoxia, and short- or long-term re-oxygenation. We characterized thousands of gene regulatory changes as the cells transition between conditions. Using available genotypes, we identified 1,573 genes with a cis expression quantitative locus (eQTL) in at least one condition, as well as 367 dynamic eQTLs, which are classified as eQTLs in at least one, but not in all conditions. A subset of genes with dynamic eQTLs is associated with complex traits and disease. Our data demonstrate how dynamic genetic effects on gene expression, which are likely relevant for disease, can be uncovered under stress.
Collapse
Affiliation(s)
- Michelle C Ward
- Department of Medicine, University of ChicagoChicagoUnited States
- Department of Biochemistry and Molecular Biology, University of Texas Medical BranchGalvestonUnited States
| | - Nicholas E Banovich
- Department of Human Genetics, University of ChicagoChicagoUnited States
- Integrated Cancer Genomics Division, Translational Genomics Research InstitutePhoenixUnited States
| | - Abhishek Sarkar
- Department of Human Genetics, University of ChicagoChicagoUnited States
| | - Matthew Stephens
- Department of Human Genetics, University of ChicagoChicagoUnited States
- Department of Statistics, University of ChicagoChicagoUnited States
| | - Yoav Gilad
- Department of Medicine, University of ChicagoChicagoUnited States
- Department of Human Genetics, University of ChicagoChicagoUnited States
| |
Collapse
|
124
|
Umans BD, Battle A, Gilad Y. Where Are the Disease-Associated eQTLs? Trends Genet 2021; 37:109-124. [PMID: 32912663 PMCID: PMC8162831 DOI: 10.1016/j.tig.2020.08.009] [Citation(s) in RCA: 183] [Impact Index Per Article: 45.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 08/07/2020] [Accepted: 08/14/2020] [Indexed: 02/07/2023]
Abstract
Most disease-associated variants, although located in putatively regulatory regions, do not have detectable effects on gene expression. One explanation could be that we have not examined gene expression in the cell types or conditions that are most relevant for disease. Even large-scale efforts to study gene expression across tissues are limited to human samples obtained opportunistically or postmortem, mostly from adults. In this review we evaluate recent findings and suggest an alternative strategy, drawing on the dynamic and highly context-specific nature of gene regulation. We discuss new technologies that can extend the standard regulatory mapping framework to more diverse, disease-relevant cell types and states.
Collapse
Affiliation(s)
- Benjamin D Umans
- Department of Medicine, University of Chicago, Chicago, IL, USA.
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Yoav Gilad
- Department of Medicine, University of Chicago, Chicago, IL, USA; Department of Human Genetics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
125
|
Garrido-Martín D, Borsari B, Calvo M, Reverter F, Guigó R. Identification and analysis of splicing quantitative trait loci across multiple tissues in the human genome. Nat Commun 2021; 12:727. [PMID: 33526779 PMCID: PMC7851174 DOI: 10.1038/s41467-020-20578-2] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Accepted: 12/02/2020] [Indexed: 12/13/2022] Open
Abstract
Alternative splicing (AS) is a fundamental step in eukaryotic mRNA biogenesis. Here, we develop an efficient and reproducible pipeline for the discovery of genetic variants that affect AS (splicing QTLs, sQTLs). We use it to analyze the GTEx dataset, generating a comprehensive catalog of sQTLs in the human genome. Downstream analysis of this catalog provides insight into the mechanisms underlying splicing regulation. We report that a core set of sQTLs is shared across multiple tissues. sQTLs often target the global splicing pattern of genes, rather than individual splicing events. Many also affect the expression of the same or other genes, uncovering regulatory loci that act through different mechanisms. sQTLs tend to be located in post-transcriptionally spliced introns, which would function as hotspots for splicing regulation. While many variants affect splicing patterns by altering the sequence of splice sites, many more modify the binding sites of RNA-binding proteins. Genetic variants affecting splicing can have a stronger phenotypic impact than those affecting gene expression.
Collapse
Affiliation(s)
- Diego Garrido-Martín
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Catalonia, Spain.
| | - Beatrice Borsari
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Catalonia, Spain
| | - Miquel Calvo
- Section of Statistics, Faculty of Biology, Universitat de Barcelona (UB), Av. Diagonal 643, Barcelona, 08028, Spain
| | - Ferran Reverter
- Section of Statistics, Faculty of Biology, Universitat de Barcelona (UB), Av. Diagonal 643, Barcelona, 08028, Spain
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Catalonia, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain.
| |
Collapse
|
126
|
Zhu H, Shang L, Zhou X. A Review of Statistical Methods for Identifying Trait-Relevant Tissues and Cell Types. Front Genet 2021; 11:587887. [PMID: 33584792 PMCID: PMC7874162 DOI: 10.3389/fgene.2020.587887] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 12/30/2020] [Indexed: 11/17/2022] Open
Abstract
Genome-wide association studies (GWASs) have identified and replicated many genetic variants that are associated with diseases and disease-related complex traits. However, the biological mechanisms underlying these identified associations remain largely elusive. Exploring the biological mechanisms underlying these associations requires identifying trait-relevant tissues and cell types, as genetic variants likely influence complex traits in a tissue- and cell type-specific manner. Recently, several statistical methods have been developed to integrate genomic data with GWASs for identifying trait-relevant tissues and cell types. These methods often rely on different genomic information and use different statistical models for trait-tissue relevance inference. Here, we present a comprehensive technical review to summarize ten existing methods for trait-tissue relevance inference. These methods make use of different genomic information that include functional annotation information, expression quantitative trait loci information, genetically regulated gene expression information, as well as gene co-expression network information. These methods also use different statistical models that range from linear mixed models to covariance network models. We hope that this review can serve as a useful reference both for methodologists who develop methods and for applied analysts who apply these methods for identifying trait relevant tissues and cell types.
Collapse
Affiliation(s)
- Huanhuan Zhu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
| | - Lulu Shang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
127
|
Saez-Atienzar S, Bandres-Ciga S, Langston RG, Kim JJ, Choi SW, Reynolds RH, Abramzon Y, Dewan R, Ahmed S, Landers JE, Chia R, Ryten M, Cookson MR, Nalls MA, Chiò A, Traynor BJ. Genetic analysis of amyotrophic lateral sclerosis identifies contributing pathways and cell types. SCIENCE ADVANCES 2021; 7:eabd9036. [PMID: 33523907 PMCID: PMC7810371 DOI: 10.1126/sciadv.abd9036] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 11/20/2020] [Indexed: 05/03/2023]
Abstract
Despite the considerable progress in unraveling the genetic causes of amyotrophic lateral sclerosis (ALS), we do not fully understand the molecular mechanisms underlying the disease. We analyzed genome-wide data involving 78,500 individuals using a polygenic risk score approach to identify the biological pathways and cell types involved in ALS. This data-driven approach identified multiple aspects of the biology underlying the disease that resolved into broader themes, namely, neuron projection morphogenesis, membrane trafficking, and signal transduction mediated by ribonucleotides. We also found that genomic risk in ALS maps consistently to GABAergic interneurons and oligodendrocytes, as confirmed in human single-nucleus RNA-seq data. Using two-sample Mendelian randomization, we nominated six differentially expressed genes (ATG16L2, ACSL5, MAP1LC3A, MAPKAPK3, PLXNB2, and SCFD1) within the significant pathways as relevant to ALS. We conclude that the disparate genetic etiologies of this fatal neurological disease converge on a smaller number of final common pathways and cell types.
Collapse
Affiliation(s)
- Sara Saez-Atienzar
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA.
| | - Sara Bandres-Ciga
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
- Instituto de Investigación Biosanitaria de Granada (ibs.GRANADA), Granada, Spain
| | - Rebekah G Langston
- Cell Biology and Gene Expression Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| | - Jonggeol J Kim
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| | - Shing Wan Choi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
| | - Regina H Reynolds
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Great Ormond Street Institute of Child Health, Genetics and Genomic Medicine, University College London, London, UK
| | - Yevgeniya Abramzon
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
- Sobell Department of Motor Neuroscience and Movement Disorders, University College London, Institute of Neurology, London, UK
| | - Ramita Dewan
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| | - Sarah Ahmed
- Neurodegenerative Diseases Research Unit, Laboratory of Neurogenetics, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
| | - John E Landers
- Department of Neurology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Ruth Chia
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mina Ryten
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Great Ormond Street Institute of Child Health, Genetics and Genomic Medicine, University College London, London, UK
| | - Mark R Cookson
- Cell Biology and Gene Expression Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michael A Nalls
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
- Data Tecnica International, Glen Echo, MD 20812, USA
| | - Adriano Chiò
- 'Rita Levi Montalcini' Department of Neuroscience, University of Turin, Turin, Italy
- Azienda Ospedaliero Universitaria Città della Salute e della Scienza, Turin, Italy
| | - Bryan J Traynor
- Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
- Department of Neurology, Johns Hopkins University, Baltimore, MD 21287, USA
| |
Collapse
|
128
|
Chen H, Wang T, Huang S, Zeng P. New novel non-MHC genes were identified for cervical cancer with an integrative analysis approach of transcriptome-wide association study. J Cancer 2021; 12:840-848. [PMID: 33403041 PMCID: PMC7778537 DOI: 10.7150/jca.47918] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 10/18/2020] [Indexed: 12/28/2022] Open
Abstract
Although genome-wide association studies (GWAS) have successfully identified multiple genetic variants associated with cervical cancer, the functional role of those variants is not well understood. To bridge such gap, we integrated the largest cervical cancer GWAS (N = 9,347) with gene expression measured in six human tissues to perform a multi-tissue transcriptome-wide association study (TWAS). We identified a total of 20 associated genes in the European population, especially four novel non-MHC genes (i.e. WDR19, RP11-384K6.2, RP11-384K6.6 and ITSN1). Further, we attempted to validate our results in another independent cervical cancer GWAS from the East Asian population (N = 3,314) and re-discovered four genes including WDR19, HLA-DOB, MICB and OR2B8P. In our subsequent co-expression analysis, we discovered SLAMF7 and LTA were co-expressed in TCGA tumor samples and showed both WDR19 and ITSN1 were enriched in "plasma membrane". Using the protein-protein interaction analysis we observed strong interactions between the proteins produced by genes that are associated with cervical cancer. Overall, our study identified multiple candidate genes, especially four non-MHC genes, which may be causally associated with the risk of cervical cancer. However, further investigations with larger sample size are warranted to validate our findings in diverse populations.
Collapse
Affiliation(s)
- Haimiao Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
129
|
Dong G, Wendl MC, Zhang B, Ding L, Huang KL. AeQTL: eQTL analysis using region-based aggregation of rare genomic variants. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2021; 26:172-183. [PMID: 33691015 PMCID: PMC8050802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Concurrently available genomic and transcriptomic data from large cohorts provide opportunities to discover expression quantitative trait loci (eQTLs)-genetic variants associated with gene expression changes. However, the statistical power of detecting rare variant eQTLs is often limited and most existing eQTL tools are not compatible with sequence variant file formats. We have developed AeQTL (Aggregated eQTL), a software tool that performs eQTL analysis on variants aggregated according to user-specified regions and is designed to accommodate standard genomic files. AeQTL consistently yielded similar or higher powers for identifying rare variant eQTLs than single-variant tests. Using AeQTL, we discovered that aggregated rare germline truncations in cis exomic regions are significantly associated with the expression of BRCA1 and SLC25A39 in breast tumors. In a somatic mutation pan-cancer analysis, aggregated mutations of those predicted to be missense versus truncations were differentially associated with gene expressions of cancer drivers, and somatic truncation eQTLs were further identified as a new multi-omic classifier of oncogenes versus tumor-suppressor genes. AeQTL is easy to use and customize, allowing a broad application for discovering rare variants, including coding and noncoding variants, associated with gene expression. AeQTL is implemented in Python and the source code is freely available at https://github.com/Huan-glab/AeQTL under the MIT license.
Collapse
Affiliation(s)
- Guanlan Dong
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Michael C. Wendl
- Department of Medicine, McDonnel Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Li Ding
- Department of Medicine, McDonnel Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Kuan-lin Huang
- Department of Genetics and Genomic Sciences, Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA,Corresponding:
| |
Collapse
|
130
|
Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
131
|
Emerging Methods and Resources for Biological Interrogation of Neuropsychiatric Polygenic Signal. Biol Psychiatry 2021; 89:41-53. [PMID: 32736792 DOI: 10.1016/j.biopsych.2020.05.022] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 04/23/2020] [Accepted: 05/14/2020] [Indexed: 01/05/2023]
Abstract
Most neuropsychiatric disorders are highly polygenic, implicating hundreds to thousands of causal genetic variants that span much of the genome. This widespread polygenicity complicates biological understanding because no single variant can explain disease etiology. A strategy to advance biological insight is to seek convergent functions among the large set of variants and map them to a smaller set of disease-relevant genes and pathways. Accordingly, functional genomic resources that provide data on intermediate molecular phenotypes, such as gene-expression and methylation status, can be leveraged to functionally annotate variants and map them to genes. Such molecular quantitative trait locus mappings can be integrated with genome-wide association studies to make sense of the polygenic signal that underlies complex disease. Other resources that provide data on the 3-dimensional structure of chromatin and functional importance of specific genomic regions can be integrated similarly. In addition, mapped genes can then be tested for convergence in biological function, tissue, cell type, or developmental stage. In this review, we provide an overview of functional genomic resources and methods that can be used to interpret results from genome-wide association studies, and we discuss current challenges for biological understanding and future requirements to overcome them.
Collapse
|
132
|
Molecular and evolutionary processes generating variation in gene expression. Nat Rev Genet 2020; 22:203-215. [PMID: 33268840 DOI: 10.1038/s41576-020-00304-w] [Citation(s) in RCA: 142] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2020] [Indexed: 12/18/2022]
Abstract
Heritable variation in gene expression is common within and between species. This variation arises from mutations that alter the form or function of molecular gene regulatory networks that are then filtered by natural selection. High-throughput methods for introducing mutations and characterizing their cis- and trans-regulatory effects on gene expression (particularly, transcription) are revealing how different molecular mechanisms generate regulatory variation, and studies comparing these mutational effects with variation seen in the wild are teasing apart the role of neutral and non-neutral evolutionary processes. This integration of molecular and evolutionary biology allows us to understand how the variation in gene expression we see today came to be and to predict how it is most likely to evolve in the future.
Collapse
|
133
|
Dang H, Polineni D, Pace RG, Stonebraker JR, Corvol H, Cutting GR, Drumm ML, Strug LJ, O’Neal WK, Knowles MR. Mining GWAS and eQTL data for CF lung disease modifiers by gene expression imputation. PLoS One 2020; 15:e0239189. [PMID: 33253230 PMCID: PMC7703903 DOI: 10.1371/journal.pone.0239189] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 09/02/2020] [Indexed: 12/18/2022] Open
Abstract
Genome wide association studies (GWAS) have identified several genomic loci with candidate modifiers of cystic fibrosis (CF) lung disease, but only a small proportion of the expected genetic contribution is accounted for at these loci. We leveraged expression data from CF cohorts, and Genotype-Tissue Expression (GTEx) reference data sets from multiple human tissues to generate predictive models, which were used to impute transcriptional regulation from genetic variance in our GWAS population. The imputed gene expression was tested for association with CF lung disease severity. By comparing and combining results from alternative approaches, we identified 379 candidate modifier genes. We delved into 52 modifier candidates that showed consensus between approaches, and 28 of them were near known GWAS loci. A number of these genes are implicated in the pathophysiology of CF lung disease (e.g., immunity, infection, inflammation, HLA pathways, glycosylation, and mucociliary clearance) and the CFTR protein biology (e.g., cytoskeleton, microtubule, mitochondrial function, lipid metabolism, endoplasmic reticulum/Golgi, and ubiquitination). Gene set enrichment results are consistent with current knowledge of CF lung disease pathogenesis. HLA Class II genes on chr6, and CEP72, EXOC3, and TPPP near the GWAS peak on chr5 are most consistently associated with CF lung disease severity across the tissues tested. The results help to prioritize genes in the GWAS regions, predict direction of gene expression regulation, and identify new candidate modifiers throughout the genome for potential therapeutic development.
Collapse
Affiliation(s)
- Hong Dang
- Marsico Lung Institute, University of North Carolina at Chapel Hill School of Medicine Cystic Fibrosis/Pulmonary Research & Treatment Center, Chapel Hill, North Carolina, United States of America
| | - Deepika Polineni
- University of Kansas Medical Center, Kansas City, Kansas, United States of America
| | - Rhonda G. Pace
- Marsico Lung Institute, University of North Carolina at Chapel Hill School of Medicine Cystic Fibrosis/Pulmonary Research & Treatment Center, Chapel Hill, North Carolina, United States of America
| | - Jaclyn R. Stonebraker
- Marsico Lung Institute, University of North Carolina at Chapel Hill School of Medicine Cystic Fibrosis/Pulmonary Research & Treatment Center, Chapel Hill, North Carolina, United States of America
| | - Harriet Corvol
- Pediatric Pulmonary Department, Assistance Publique-Hôpitaux sde Paris (AP-HP), Hôpital Trousseau, Institut National de la Santé et la Recherche Médicale (INSERM) U938, Paris, France
- Sorbonne Universités, Université Pierre et Marie Curie (UPMC), Paris 6, Paris, France
| | - Garry R. Cutting
- McKusick-Nathans Institute of Genetic Medicine, Baltimore, Maryland, United States of America
- Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Mitchell L. Drumm
- Department of Pediatrics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Lisa J. Strug
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Wanda K. O’Neal
- Marsico Lung Institute, University of North Carolina at Chapel Hill School of Medicine Cystic Fibrosis/Pulmonary Research & Treatment Center, Chapel Hill, North Carolina, United States of America
| | - Michael R. Knowles
- Marsico Lung Institute, University of North Carolina at Chapel Hill School of Medicine Cystic Fibrosis/Pulmonary Research & Treatment Center, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
134
|
Abstract
Many modern problems in medicine and public health leverage machine-learning methods to predict outcomes based on observable covariates. In a wide array of settings, predicted outcomes are used in subsequent statistical analysis, often without accounting for the distinction between observed and predicted outcomes. We call inference with predicted outcomes postprediction inference. In this paper, we develop methods for correcting statistical inference using outcomes predicted with arbitrarily complicated machine-learning models including random forests and deep neural nets. Rather than trying to derive the correction from first principles for each machine-learning algorithm, we observe that there is typically a low-dimensional and easily modeled representation of the relationship between the observed and predicted outcomes. We build an approach for postprediction inference that naturally fits into the standard machine-learning framework where the data are divided into training, testing, and validation sets. We train the prediction model in the training set, estimate the relationship between the observed and predicted outcomes in the testing set, and use that relationship to correct subsequent inference in the validation set. We show our postprediction inference (postpi) approach can correct bias and improve variance estimation and subsequent statistical inference with predicted outcomes. To show the broad range of applicability of our approach, we show postpi can improve inference in two distinct fields: modeling predicted phenotypes in repurposed gene expression data and modeling predicted causes of death in verbal autopsy data. Our method is available through an open-source R package: https://github.com/leekgroup/postpi.
Collapse
|
135
|
Renganaath K, Chong R, Day L, Kosuri S, Kruglyak L, Albert FW. Systematic identification of cis-regulatory variants that cause gene expression differences in a yeast cross. eLife 2020; 9:e62669. [PMID: 33179598 PMCID: PMC7685706 DOI: 10.7554/elife.62669] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 11/11/2020] [Indexed: 02/06/2023] Open
Abstract
Sequence variation in regulatory DNA alters gene expression and shapes genetically complex traits. However, the identification of individual, causal regulatory variants is challenging. Here, we used a massively parallel reporter assay to measure the cis-regulatory consequences of 5832 natural DNA variants in the promoters of 2503 genes in the yeast Saccharomyces cerevisiae. We identified 451 causal variants, which underlie genetic loci known to affect gene expression. Several promoters harbored multiple causal variants. In five promoters, pairs of variants showed non-additive, epistatic interactions. Causal variants were enriched at conserved nucleotides, tended to have low derived allele frequency, and were depleted from promoters of essential genes, which is consistent with the action of negative selection. Causal variants were also enriched for alterations in transcription factor binding sites. Models integrating these features provided modest, but statistically significant, ability to predict causal variants. This work revealed a complex molecular basis for cis-acting regulatory variation.
Collapse
Affiliation(s)
- Kaushik Renganaath
- Department of Genetics, Cell Biology, & Development, University of MinnesotaMinneapolisUnited States
| | - Rockie Chong
- Department of Chemistry & Biochemistry, University of California, Los AngelesLos AngelesUnited States
| | - Laura Day
- Department of Human Genetics, University of California, Los AngelesLos AngelesUnited States
- Department of Biological Chemistry, University of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical Institute, University of California, Los AngelesLos AngelesUnited States
| | - Sriram Kosuri
- Department of Chemistry & Biochemistry, University of California, Los AngelesLos AngelesUnited States
| | - Leonid Kruglyak
- Department of Human Genetics, University of California, Los AngelesLos AngelesUnited States
- Department of Biological Chemistry, University of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical Institute, University of California, Los AngelesLos AngelesUnited States
| | - Frank W Albert
- Department of Genetics, Cell Biology, & Development, University of MinnesotaMinneapolisUnited States
| |
Collapse
|
136
|
Liu W, Li M, Zhang W, Zhou G, Wu X, Wang J, Lu Q, Zhao H. Leveraging functional annotation to identify genes associated with complex diseases. PLoS Comput Biol 2020; 16:e1008315. [PMID: 33137096 PMCID: PMC7660930 DOI: 10.1371/journal.pcbi.1008315] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 11/12/2020] [Accepted: 09/05/2020] [Indexed: 02/06/2023] Open
Abstract
To increase statistical power to identify genes associated with complex traits, a number of transcriptome-wide association study (TWAS) methods have been proposed using gene expression as a mediating trait linking genetic variations and diseases. These methods first predict expression levels based on inferred expression quantitative trait loci (eQTLs) and then identify expression-mediated genetic effects on diseases by associating phenotypes with predicted expression levels. The success of these methods critically depends on the identification of eQTLs, which may not be functional in the corresponding tissue, due to linkage disequilibrium (LD) and the correlation of gene expression between tissues. Here, we introduce a new method called T-GEN (Transcriptome-mediated identification of disease-associated Genes with Epigenetic aNnotation) to identify disease-associated genes leveraging epigenetic information. Through prioritizing SNPs with tissue-specific epigenetic annotation, T-GEN can better identify SNPs that are both statistically predictive and biologically functional. We found that a significantly higher percentage (an increase of 18.7% to 47.2%) of eQTLs identified by T-GEN are inferred to be functional by ChromHMM and more are deleterious based on their Combined Annotation Dependent Depletion (CADD) scores. Applying T-GEN to 207 complex traits, we were able to identify more trait-associated genes (ranging from 7.7% to 102%) than those from existing methods. Among the identified genes associated with these traits, T-GEN can better identify genes with high (>0.99) pLI scores compared to other methods. When T-GEN was applied to late-onset Alzheimer's disease, we identified 96 genes located at 15 loci, including two novel loci not implicated in previous GWAS. We further replicated 50 genes in an independent GWAS, including one of the two novel loci.
Collapse
Affiliation(s)
- Wei Liu
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| | - Mo Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States of America
| | - Wenfeng Zhang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States of America
| | - Geyu Zhou
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| | - Xing Wu
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, United States of America
| | - Jiawei Wang
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, WI, United States of America
- Department of Statistics, University of Wisconsin-Madison, WI, United States of America
- Center for Demography of Health and Aging, University of Wisconsin-Madison, WI, United States of America
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States of America
- Department of Genetics, Yale School of Medicine, New Haven, CT, United States of America
| |
Collapse
|
137
|
Cherlin S, Lewis MJ, Plant D, Nair N, Goldmann K, Tzanis E, Barnes MR, McKeigue P, Barrett JH, Pitzalis C, Barton A, Cordell HJ. Investigation of genetically regulated gene expression and response to treatment in rheumatoid arthritis highlights an association between IL18RAP expression and treatment response. Ann Rheum Dis 2020; 79:1446-1452. [PMID: 32732242 PMCID: PMC7569378 DOI: 10.1136/annrheumdis-2020-217204] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 06/19/2020] [Accepted: 06/21/2020] [Indexed: 01/08/2023]
Abstract
OBJECTIVES In this study, we sought to investigate whether there was any association between genetically regulated gene expression (as predicted using various reference panels) and anti-tumour necrosis factor (anti-TNF) treatment response (change in erythrocyte sedimentation rate (ESR)) using 3158 European ancestry patients with rheumatoid arthritis. METHODS The genetically regulated portion of gene expression was estimated in the full cohort of 3158 subjects (as well as within a subcohort consisting of 1575 UK patients) using the PrediXcan software package with three different reference panels. Estimated expression was tested for association with anti-TNF treatment response. As a replication/validation experiment, we also investigated the correlation between change in ESR with measured gene expression at the Interleukin 18 Receptor Accessory Protein (IL18RAP) gene in whole blood and synovial tissue, using an independent replication data set of patients receiving conventional synthetic disease modifying anti-rheumatic drugs, with directly measured (via RNA sequencing) gene expression. RESULTS We found that predicted expression of IL18RAP showed a consistent signal of association with treatment response across the reference panels. In our independent replication data set, IL18RAP expression in whole blood showed correlation with the change in ESR between baseline and follow-up (r=-0.35, p=0.0091). Change in ESR was also correlated with the expression of IL18RAP in synovial tissue (r=-0.28, p=0.02). CONCLUSION Our results suggest that IL18RAP expression is worthy of further investigation as a potential predictor of treatment response in rheumatoid arthritis that is not specific to a particular drug type.
Collapse
Affiliation(s)
- Svetlana Cherlin
- Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, Tyne and Wear, UK
| | - Myles J Lewis
- Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Darren Plant
- Centre of Genetics & Genomics Versus Arthritis, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK
- NIHR Manchester Biomedical Research Centre, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK
| | - Nisha Nair
- Centre of Genetics & Genomics Versus Arthritis, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK
| | - Katriona Goldmann
- Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Evan Tzanis
- Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Michael R Barnes
- Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Paul McKeigue
- Centre for Population Health Sciences, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, UK
| | - Jennifer H Barrett
- NIHR Leeds Biomedical Research Centre, Leeds Teaching Hospitals NHS Trust, Leeds, UK
- School of Medicine, University of Leeds, Leeds, UK
| | - Costantino Pitzalis
- Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Anne Barton
- Centre of Genetics & Genomics Versus Arthritis, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK
- NIHR Manchester Biomedical Research Centre, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK
| | - Heather J Cordell
- Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, Tyne and Wear, UK
| |
Collapse
|
138
|
Borchert C, Herman A, Roth M, Brooks AC, Friedenberg SG. RNA sequencing of whole blood in dogs with primary immune-mediated hemolytic anemia (IMHA) reveals novel insights into disease pathogenesis. PLoS One 2020; 15:e0240975. [PMID: 33091028 PMCID: PMC7580939 DOI: 10.1371/journal.pone.0240975] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 10/06/2020] [Indexed: 11/29/2022] Open
Abstract
Immune-mediated hemolytic anemia (IMHA) is a life-threatening autoimmune disorder characterized by a self-mediated attack on circulating red blood cells. The disease occurs naturally in both dogs and humans, but is significantly more prevalent in dogs. Because of its shared features across species, dogs offer a naturally occurring model for studying IMHA in people. In this study, we used RNA sequencing of whole blood from treatment-naïve dogs to study transcriptome-wide changes in gene expression in newly diagnosed animals compared to healthy controls. We found many overexpressed genes in pathways related to neutrophil function, coagulation, and hematopoiesis. In particular, the most highly overexpressed gene in cases was a phospholipase scramblase, which mediates the externalization of phosphatidylserine from the inner to the outer leaflet of cell membranes. This family of genes has been shown to be critically important for programmed cell death of erythrocytes as well as the initiation of the clotting cascade. Unexpectedly, we found marked underexpression of many genes related to lymphocyte function. We also identified groups of genes that are highly associated with the inflammatory response and red blood cell regeneration in affected dogs. We did not find any genes that distinguished dogs that lived vs. those that died at 30 days following diagnosis, nor did we find any relevant genomic signatures of microbial organisms in the blood of affected animals. Future studies are warranted to validate these findings and assess their implication in developing novel therapeutic approaches for dogs and humans with IMHA.
Collapse
Affiliation(s)
- Corie Borchert
- Department of Veterinary Clinical Sciences, University of Minnesota College of Veterinary Medicine, St. Paul, Minnesota, United States of America
| | - Adam Herman
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Megan Roth
- Department of Veterinary Clinical Sciences, University of Minnesota College of Veterinary Medicine, St. Paul, Minnesota, United States of America
| | - Aimee C. Brooks
- Department of Veterinary Clinical Sciences, Purdue University College of Veterinary Medicine, West Lafayette, Indiana, United States of America
| | - Steven G. Friedenberg
- Department of Veterinary Clinical Sciences, University of Minnesota College of Veterinary Medicine, St. Paul, Minnesota, United States of America
| |
Collapse
|
139
|
Fair BJ, Blake LE, Sarkar A, Pavlovic BJ, Cuevas C, Gilad Y. Gene expression variability in human and chimpanzee populations share common determinants. eLife 2020; 9:59929. [PMID: 33084571 PMCID: PMC7644215 DOI: 10.7554/elife.59929] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 10/20/2020] [Indexed: 12/20/2022] Open
Abstract
Inter-individual variation in gene expression has been shown to be heritable and is often associated with differences in disease susceptibility between individuals. Many studies focused on mapping associations between genetic and gene regulatory variation, yet much less attention has been paid to the evolutionary processes that shape the observed differences in gene regulation between individuals in humans or any other primate. To begin addressing this gap, we performed a comparative analysis of gene expression variability and expression quantitative trait loci (eQTLs) in humans and chimpanzees, using gene expression data from primary heart samples. We found that expression variability in both species is often determined by non-genetic sources, such as cell-type heterogeneity. However, we also provide evidence that inter-individual variation in gene regulation can be genetically controlled, and that the degree of such variability is generally conserved in humans and chimpanzees. In particular, we found a significant overlap of orthologous genes associated with eQTLs in both species. We conclude that gene expression variability in humans and chimpanzees often evolves under similar evolutionary pressures.
Collapse
Affiliation(s)
| | - Lauren E Blake
- Department of Human Genetics, University of Chicago, Chicago, United States
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, United States
| | - Bryan J Pavlovic
- Department of Neurology, University of California, San Francisco (UCSF), San Francisco, United States
| | - Claudia Cuevas
- Department of Human Genetics, University of Chicago, Chicago, United States
| | - Yoav Gilad
- Department of Medicine, University of Chicago, Chicago, United States.,Department of Human Genetics, University of Chicago, Chicago, United States
| |
Collapse
|
140
|
Sieberts SK, Perumal TM, Carrasquillo MM, Allen M, Reddy JS, Hoffman GE, Dang KK, Calley J, Ebert PJ, Eddy J, Wang X, Greenwood AK, Mostafavi S, Omberg L, Peters MA, Logsdon BA, De Jager PL, Ertekin-Taner N, Mangravite LM. Large eQTL meta-analysis reveals differing patterns between cerebral cortical and cerebellar brain regions. Sci Data 2020; 7:340. [PMID: 33046718 PMCID: PMC7550587 DOI: 10.1038/s41597-020-00642-8] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 08/24/2020] [Indexed: 12/27/2022] Open
Abstract
The availability of high-quality RNA-sequencing and genotyping data of post-mortem brain collections from consortia such as CommonMind Consortium (CMC) and the Accelerating Medicines Partnership for Alzheimer's Disease (AMP-AD) Consortium enable the generation of a large-scale brain cis-eQTL meta-analysis. Here we generate cerebral cortical eQTL from 1433 samples available from four cohorts (identifying >4.1 million significant eQTL for >18,000 genes), as well as cerebellar eQTL from 261 samples (identifying 874,836 significant eQTL for >10,000 genes). We find substantially improved power in the meta-analysis over individual cohort analyses, particularly in comparison to the Genotype-Tissue Expression (GTEx) Project eQTL. Additionally, we observed differences in eQTL patterns between cerebral and cerebellar brain regions. We provide these brain eQTL as a resource for use by the research community. As a proof of principle for their utility, we apply a colocalization analysis to identify genes underlying the GWAS association peaks for schizophrenia and identify a potentially novel gene colocalization with lncRNA RP11-677M14.2 (posterior probability of colocalization 0.975).
Collapse
Affiliation(s)
| | | | | | - Mariet Allen
- Department of Neuroscience, Mayo Clinic Florida, Jacksonville, FL, 32224, USA
| | - Joseph S Reddy
- Department of Neuroscience, Mayo Clinic Florida, Jacksonville, FL, 32224, USA
| | - Gabriel E Hoffman
- Pamela Sklar Division of Psychiatric Genomics, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Institute for Data Science and Genomic Technology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | | | - John Calley
- Lilly Research Labs, Eli Lilly and Company, Indianapolis, IN, 46225, USA
| | - Philip J Ebert
- Lilly Research Labs, Eli Lilly and Company, Indianapolis, IN, 46225, USA
| | - James Eddy
- Sage Bionetworks, Seattle, WA, 98121, USA
| | - Xue Wang
- Department of Neuroscience, Mayo Clinic Florida, Jacksonville, FL, 32224, USA
| | | | - Sara Mostafavi
- Departments of Statistics and Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
- Centre for Molecular Medicine and Therapeutics, Vancouver, British Columbia, Canada
- Canadian Institute for Advanced Research, CIFAR Program in Child and Brain Development, Toronto, Ontario, Canada
| | | | | | | | - Philip L De Jager
- Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Medical Center, New York, NY, 10032, USA
| | - Nilüfer Ertekin-Taner
- Department of Neuroscience, Mayo Clinic Florida, Jacksonville, FL, 32224, USA
- Department of Neurology, Mayo Clinic Florida, Jacksonville, FL, 32224, USA
| | | |
Collapse
|
141
|
Chen TH, Chatterjee N, Landi MT, Shi J. A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information. J Am Stat Assoc 2020; 116:133-143. [PMID: 34483403 DOI: 10.1080/01621459.2020.1764849] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Large-scale genome-wide association (GWAS) studies provide opportunities for developing genetic risk prediction models that have the potential to improve disease prevention, intervention or treatment. The key step is to develop polygenic risk score (PRS) models with high predictive performance for a given disease, which typically requires a large training data set for selecting truly associated single nucleotide polymorphisms (SNPs) and estimating effect sizes accurately. Here, we develop a comprehensive penalized regression for fitting l 1 regularized regression models to GWAS summary statistics. We propose incorporating Pleiotropy and ANnotation information into PRS (PANPRS) development through suitable formulation of penalty functions and associated tuning parameters. Extensive simulations show that PANPRS performs equally well or better than existing PRS methods when no functional annotation or pleiotropy is incorporated. When functional annotation data and pleiotropy are informative, PANPRS substantially outperforms existing PRS methods in simulations. Finally, we applied our methods to build PRS for type 2 diabetes and melanoma and found that incorporating relevant functional annotations and GWAS of genetically related traits improved prediction of these two complex diseases.
Collapse
Affiliation(s)
- Ting-Huei Chen
- Department of Mathematics and Statistics, Regular member, Cervo Brain Research Centre, University of Laval, 1045, av. of Medicine, Suite 1056, Quebec G1V 0A6, Canada
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University Baltimore, Maryland, United States of America, 615 N Wolfe Street Baltimore, MD 21205
| | - Maria Teresa Landi
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Maryland, United States of America, 9609 Medical Center Drive, RM 7E106, Bethesda, MD, 20892
| | - Jianxin Shi
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Maryland, United States of America, 9609 Medical Center Drive, RM 7E122, Bethesda, MD, 20892
| |
Collapse
|
142
|
Moradifard S, Hoseinbeyki M, Emam MM, Parchiniparchin F, Ebrahimi-Rad M. Association of the Sp1 binding site and -1997 promoter variations in COL1A1 with osteoporosis risk: The application of meta-analysis and bioinformatics approaches offers a new perspective for future research. MUTATION RESEARCH. REVIEWS IN MUTATION RESEARCH 2020; 786:108339. [PMID: 33339581 DOI: 10.1016/j.mrrev.2020.108339] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 08/11/2020] [Accepted: 10/06/2020] [Indexed: 12/21/2022]
Abstract
As a complex disease, osteoporosis is influenced by several genetic markers. Many studies have examined the link between the Sp1 binding site +1245 G > T (rs1800012) and -1997 G > T (rs1107946) variations in the COL1A1 gene with osteoporosis risk. However, the findings of these studies have been contradictory; therefore, we performed a meta-analysis to aggregate additional information and obtain increased statistical power to more efficiently estimate this correlation. A meta-analysis was conducted with studies published between 1991-2020 that were identified by a systematic electronic search of the Scopus and Clarivate Analytics databases. Studies with bone mineral density (BMD) data and complete genotypes of the single-nucleotide variations (SNVs) for the overall and postmenopausal female population were included in this meta-analysis and analyzed using the R metaphor package. A relationship between rs1800012 and significantly decreased BMD values at the lumbar spine and femoral neck was found in individuals carrying the "ss" versus the "SS" genotype in the overall population according to a random effects model (p < 0.0001). Similar results were also found in the postmenopausal female population (p = 0.003 and 0.0002, respectively). Such findings might be an indication of increased osteoporosis risk in both studied groups in individuals with the "ss" genotype. Although no association was identified between the -1997 G > T and low BMD in the overall population, those individuals with the "GT" genotype showed a higher level of BMD than those with "GG" in the subgroup analysis (p = 0.007). To determine which transcription factor (TF) might bind to the -1997 G > T in COL1A1, 45 TFs were identified based on bioinformatics predictions. According to the GSE35958 microarray dataset, 16 of 45 TFs showed differential expression profiles in osteoporotic human mesenchymal stem cells relative to normal samples from elderly donors. By identifying candidate TFs for the -1997 G > T site, our study offers a new perspective for future research.
Collapse
Affiliation(s)
| | | | - Mohammad Mehdi Emam
- Rheumatology Ward, Loghman Hospital, Shahid Beheshti Medical University (SBMU), Tehran, Iran
| | | | | |
Collapse
|
143
|
Brandt M, Gokden A, Ziosi M, Lappalainen T. A polyclonal allelic expression assay for detecting regulatory effects of transcript variants. Genome Med 2020; 12:79. [PMID: 32912286 PMCID: PMC7488413 DOI: 10.1186/s13073-020-00777-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Accepted: 08/19/2020] [Indexed: 12/12/2022] Open
Abstract
We present an assay to experimentally test the regulatory effects of genetic variants within transcripts using CRISPR/Cas9 followed by targeted sequencing. We applied the assay to 32 premature stop-gained variants across the genome and in two Mendelian disease genes, 33 putative causal variants of eQTLs, and 62 control variants in HEK293T cells, replicating a subset of variants in HeLa cells. We detected significant effects in the expected direction (in 60% of variants), demonstrating the ability of the assay to capture regulatory effects of eQTL variants and nonsense-mediated decay triggered by premature stop-gained variants. The results suggest a utility for validating transcript-level effects of genetic variants.
Collapse
Affiliation(s)
- Margot Brandt
- New York Genome Center, New York, NY, USA.,Department of Systems Biology, Columbia University, New York, NY, USA
| | | | | | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA. .,Department of Systems Biology, Columbia University, New York, NY, USA.
| |
Collapse
|
144
|
He Y, Chhetri SB, Arvanitis M, Srinivasan K, Aguet F, Ardlie KG, Barbeira AN, Bonazzola R, Im HK, Brown CD, Battle A. sn-spMF: matrix factorization informs tissue-specific genetic regulation of gene expression. Genome Biol 2020; 21:235. [PMID: 32912314 PMCID: PMC7488540 DOI: 10.1186/s13059-020-02129-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 08/04/2020] [Indexed: 01/09/2023] Open
Abstract
Genetic regulation of gene expression, revealed by expression quantitative trait loci (eQTLs), exhibits complex patterns of tissue-specific effects. Characterization of these patterns may allow us to better understand mechanisms of gene regulation and disease etiology. We develop a constrained matrix factorization model, sn-spMF, to learn patterns of tissue-sharing and apply it to 49 human tissues from the Genotype-Tissue Expression (GTEx) project. The learned factors reflect tissues with known biological similarity and identify transcription factors that may mediate tissue-specific effects. sn-spMF, available at https://github.com/heyuan7676/ts_eQTLs , can be applied to learn biologically interpretable patterns of eQTL tissue-specificity and generate testable mechanistic hypotheses.
Collapse
Affiliation(s)
- Yuan He
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, 21218, MD, USA
| | - Surya B Chhetri
- HudsonAlpha Institute for Biotechnology, Huntsville, 35806, AL, USA
- Current Address: Department of Biomedical Engineering, Johns Hopkins University, Baltimore, 21218, MD, USA
| | - Marios Arvanitis
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, 21218, MD, USA
- Department of Medicine, Division of Cardiology, Johns Hopkins University, Baltimore, 21287, MD, USA
| | - Kaushik Srinivasan
- Department of Computer Science, Johns Hopkins University, Baltimore, 21218, MD, USA
| | - François Aguet
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Alvaro N Barbeira
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - Rodrigo Bonazzola
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - Christopher D Brown
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, 19104, PA, USA.
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, 21218, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, 21218, MD, USA.
| |
Collapse
|
145
|
The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 2020; 369:1318-1330. [PMID: 32913098 PMCID: PMC7737656 DOI: 10.1126/science.aaz1776] [Citation(s) in RCA: 2697] [Impact Index Per Article: 539.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 07/30/2020] [Indexed: 02/06/2023]
Abstract
The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.
Collapse
|
146
|
Kolberg L, Kerimov N, Peterson H, Alasoo K. Co-expression analysis reveals interpretable gene modules controlled by trans-acting genetic variants. eLife 2020; 9:e58705. [PMID: 32880574 PMCID: PMC7470823 DOI: 10.7554/elife.58705] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 08/20/2020] [Indexed: 12/16/2022] Open
Abstract
Understanding the causal processes that contribute to disease onset and progression is essential for developing novel therapies. Although trans-acting expression quantitative trait loci (trans-eQTLs) can directly reveal cellular processes modulated by disease variants, detecting trans-eQTLs remains challenging due to their small effect sizes. Here, we analysed gene expression and genotype data from six blood cell types from 226 to 710 individuals. We used co-expression modules inferred from gene expression data with five methods as traits in trans-eQTL analysis to limit multiple testing and improve interpretability. In addition to replicating three established associations, we discovered a novel trans-eQTL near SLC39A8 regulating a module of metallothionein genes in LPS-stimulated monocytes. Interestingly, this effect was mediated by a transient cis-eQTL present only in early LPS response and lost before the trans effect appeared. Our analyses highlight how co-expression combined with functional enrichment analysis improves the identification and prioritisation of trans-eQTLs when applied to emerging cell-type-specific datasets.
Collapse
Affiliation(s)
- Liis Kolberg
- Institute of Computer Science, University of TartuTartuEstonia
| | - Nurlan Kerimov
- Institute of Computer Science, University of TartuTartuEstonia
| | - Hedi Peterson
- Institute of Computer Science, University of TartuTartuEstonia
| | - Kaur Alasoo
- Institute of Computer Science, University of TartuTartuEstonia
| |
Collapse
|
147
|
Liu X, Mefford JA, Dahl A, He Y, Subramaniam M, Battle A, Price AL, Zaitlen N. GBAT: a gene-based association test for robust detection of trans-gene regulation. Genome Biol 2020; 21:211. [PMID: 32831138 PMCID: PMC7444084 DOI: 10.1186/s13059-020-02120-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 07/27/2020] [Indexed: 02/07/2023] Open
Abstract
The observation that disease-associated genetic variants typically reside outside of exons has inspired widespread investigation into the genetic basis of transcriptional regulation. While associations between the mRNA abundance of a gene and its proximal SNPs (cis-eQTLs) are now readily identified, identification of high-quality distal associations (trans-eQTLs) has been limited by a heavy multiple testing burden and the proneness to false-positive signals. To address these issues, we develop GBAT, a powerful gene-based pipeline that allows robust detection of high-quality trans-gene regulation signal.
Collapse
Affiliation(s)
- Xuanyao Liu
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA USA
- Department of Human Genetics, The University of Chicago, Chicago, IL USA
| | - Joel A. Mefford
- Departments of Neurology and Computational Medicine, University of California Los Angeles, Los Angeles, CA USA
| | - Andrew Dahl
- Departments of Neurology and Computational Medicine, University of California Los Angeles, Los Angeles, CA USA
| | - Yuan He
- Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| | - Meena Subramaniam
- Departments of Neurology and Computational Medicine, University of California Los Angeles, Los Angeles, CA USA
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| | - Alkes L. Price
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA USA
| | - Noah Zaitlen
- Departments of Neurology and Computational Medicine, University of California Los Angeles, Los Angeles, CA USA
| |
Collapse
|
148
|
Ho AMC, Coombes BJ, Nguyen TTL, Liu D, McElroy SL, Singh B, Nassan M, Colby CL, Larrabee BR, Weinshilboum RM, Frye MA, Biernacka JM. Mood-Stabilizing Antiepileptic Treatment Response in Bipolar Disorder: A Genome-Wide Association Study. Clin Pharmacol Ther 2020; 108:1233-1242. [PMID: 32627186 PMCID: PMC7669647 DOI: 10.1002/cpt.1982] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 06/15/2020] [Indexed: 12/31/2022]
Abstract
Several antiepileptic drugs (AEDs) have US Food and Drug Administration (FDA) approval for use as mood stabilizers in bipolar disorder (BD), but not all BD patients respond to these AED mood stabilizers (AED‐MSs). To identify genetic polymorphisms that contribute to the variability in AED‐MS response, we performed a discovery genome‐wide association study (GWAS) of 199 BD patients from the Mayo Clinic Bipolar Disorder Biobank. Most of these patients had been treated with the AED‐MS valproate/divalproex and/or lamotrigine. AED‐MS response was assessed using the Alda scale, which quantifies clinical improvement while accounting for potential confounding factors. We identified two genome‐wide significant single‐nucleotide polymorphism (SNP) signals that mapped to the THSD7A (rs78835388, P = 7.1E‐09) and SLC35F3 (rs114872993, P = 3.2E‐08) genes. We also identified two genes with statistically significant gene‐level associations: ABCC1 (P = 6.7E‐07; top SNP rs875740, P = 2.0E‐6), and DISP1 (P = 8.9E‐07; top SNP rs34701716, P = 8.9E‐07). THSD7A SNPs were previously found to be associated with risk for several psychiatric disorders, including BD. Both THSD7A and SLC35F3 are expressed in excitatory/glutamatergic and inhibitory/γ‐aminobutyric acidergic (GABAergic) neurons, which are targets of AED‐MSs. ABCC1 is involved in the transport of valproate and lamotrigine metabolites, and the SNPs in ABCC1 and DISP1 with the strongest evidence of association in our GWAS are strong splicing quantitative trait loci in the human gut, suggesting a possible influence on drug absorption. In conclusion, our pharmacogenomic study identified novel genetic loci that appear to contribute to AED‐MS treatment response, and may facilitate precision medicine in BD.
Collapse
Affiliation(s)
- Ada Man-Choi Ho
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, Minnesota, USA.,Department of Molecular Pharmacology & Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, USA
| | - Brandon J Coombes
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Thanh Thanh L Nguyen
- Department of Molecular Pharmacology & Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, USA
| | - Duan Liu
- Department of Molecular Pharmacology & Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, USA
| | - Susan L McElroy
- Lindner Center of HOPE/University of Cincinnati, Cincinnati, Ohio, USA
| | - Balwinder Singh
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, Minnesota, USA
| | - Malik Nassan
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, Minnesota, USA
| | - Colin L Colby
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Beth R Larrabee
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Richard M Weinshilboum
- Department of Molecular Pharmacology & Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, USA
| | - Mark A Frye
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, Minnesota, USA
| | - Joanna M Biernacka
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, Minnesota, USA.,Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
149
|
Keys KL, Mak ACY, White MJ, Eckalbar WL, Dahl AW, Mefford J, Mikhaylova AV, Contreras MG, Elhawary JR, Eng C, Hu D, Huntsman S, Oh SS, Salazar S, Lenoir MA, Ye JC, Thornton TA, Zaitlen N, Burchard EG, Gignoux CR. On the cross-population generalizability of gene expression prediction models. PLoS Genet 2020; 16:e1008927. [PMID: 32797036 PMCID: PMC7449671 DOI: 10.1371/journal.pgen.1008927] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 08/26/2020] [Accepted: 06/10/2020] [Indexed: 11/21/2022] Open
Abstract
The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.
Collapse
Affiliation(s)
- Kevin L. Keys
- Department of Medicine, University of California, San Francisco, California, United States of America
- Berkeley Institute for Data Science, University of California, Berkeley, California, United States of America
| | - Angel C. Y. Mak
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Marquitta J. White
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Walter L. Eckalbar
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Andrew W. Dahl
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Joel Mefford
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Anna V. Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - María G. Contreras
- Department of Medicine, University of California, San Francisco, California, United States of America
- San Francisco State University, San Francisco, California, United States of America
| | - Jennifer R. Elhawary
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Celeste Eng
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Donglei Hu
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Scott Huntsman
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Sam S. Oh
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Sandra Salazar
- Department of Medicine, University of California, San Francisco, California, United States of America
| | | | - Jimmie C. Ye
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, United States of America
- Department of Bioengineering and Therapeutic Biosciences, University of California, San Francisco, California, United States of America
| | - Timothy A. Thornton
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Noah Zaitlen
- Department of Neurology, University of California, Los Angeles, California, United States of America
| | - Esteban G. Burchard
- Department of Medicine, University of California, San Francisco, California, United States of America
- Department of Bioengineering and Therapeutic Biosciences, University of California, San Francisco, California, United States of America
| | - Christopher R. Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| |
Collapse
|
150
|
Dong X, Su YR, Barfield R, Bien SA, He Q, Harrison TA, Huyghe JR, Keku TO, Lindor NM, Schafmayer C, Chan AT, Gruber SB, Jenkins MA, Kooperberg C, Peters U, Hsu L. A general framework for functionally informed set-based analysis: Application to a large-scale colorectal cancer study. PLoS Genet 2020; 16:e1008947. [PMID: 32833970 PMCID: PMC7470748 DOI: 10.1371/journal.pgen.1008947] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 09/03/2020] [Accepted: 06/22/2020] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with various phenotypes, but together they explain only a fraction of heritability, suggesting many variants have yet to be discovered. Recently it has been recognized that incorporating functional information of genetic variants can improve power for identifying novel loci. For example, S-PrediXcan and TWAS tested the association of predicted gene expression with phenotypes based on GWAS summary statistics by leveraging the information on genetic regulation of gene expression and found many novel loci. However, as genetic variants may have effects on more than one gene and through different mechanisms, these methods likely only capture part of the total effects of these variants. In this paper, we propose a summary statistics-based mixed effects score test (sMiST) that tests for the total effect of both the effect of the mediator by imputing genetically predicted gene expression, like S-PrediXcan and TWAS, and the direct effects of individual variants. It allows for multiple functional annotations and multiple genetically predicted mediators. It can also perform conditional association analysis while adjusting for other genetic variants (e.g., known loci for the phenotype). Extensive simulation and real data analyses demonstrate that sMiST yields p-values that agree well with those obtained from individual level data but with substantively improved computational speed. Importantly, a broad application of sMiST to GWAS is possible, as only summary statistics of genetic variant associations are required. We apply sMiST to a large-scale GWAS of colorectal cancer using summary statistics from ∼120, 000 study participants and gene expression data from the Genotype-Tissue Expression (GTEx) project. We identify several novel and secondary independent genetic loci.
Collapse
Affiliation(s)
- Xinyuan Dong
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Yu-Ru Su
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Richard Barfield
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Stephanie A. Bien
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Qianchuan He
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Tabitha A. Harrison
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Jeroen R. Huyghe
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Temitope O. Keku
- Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Noralane M. Lindor
- Department of Health Science Research, Mayo Clinic, Scottsdale, Arizona, USA
| | - Clemens Schafmayer
- Department of General Surgery, University Hospital Rostock, Rostock, Germany
| | - Andrew T. Chan
- Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, and Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Stephen B. Gruber
- City of Hope National Medical Center, Duarte, and Department of Preventive Medicine & USC Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Mark A. Jenkins
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Victoria, Australia
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Li Hsu
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| |
Collapse
|