51
|
Abstract
Allelic imbalance occurs when the two alleles of a gene are differentially expressed within a diploid organism, and can indicate important differences in cis-regulation and epigenetic state across the two chromosomes. Because of this, the ability to accurately quantify the proportion at which each allele of a gene is expressed is of great interest to researchers. This becomes challenging in the presence of small read counts and/or sample sizes, which can cause estimates for allelic expression proportions to have high variance. Investigators have traditionally dealt with this problem by filtering out genes with small counts and samples. However, this may inadvertently remove important genes that have truly large allelic imbalances. Another option is to use Bayesian estimators to reduce the variance. To this end, we evaluated the accuracy of three different estimators, the latter two of which are Bayesian shrinkage estimators: maximum likelihood, approximate posterior estimation of GLM coefficients (apeglm) and adaptive shrinkage (ash). We also wrote C++ code to quickly calculate ML and apeglm estimates, and integrated it into the apeglm package. The three methods were evaluated on both simulated and real data. Apeglm consistently performed better than ML according to a variety of criteria, including mean absolute error and concordance at the top. While ash had lower error and greater concordance than ML on the simulations, it also had a tendency to over-shrink large effects, and performed worse on the real data according to error and concordance. Furthermore, when compared to five other packages that also fit beta-binomial models, the apeglm package was substantially faster, making our package useful for quick and reliable analyses of allelic imbalance. Apeglm is available as an R/Bioconductor package at http://bioconductor.org/packages/apeglm.
Collapse
Affiliation(s)
- Joshua P Zitovsky
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514, USA
| |
Collapse
|
52
|
Abstract
Allelic imbalance occurs when the two alleles of a gene are differentially expressed within a diploid organism and can indicate important differences in cis-regulation and epigenetic state across the two chromosomes. Because of this, the ability to accurately quantify the proportion at which each allele of a gene is expressed is of great interest to researchers. This becomes challenging in the presence of small read counts and/or sample sizes, which can cause estimators for allelic expression proportions to have high variance. Investigators have traditionally dealt with this problem by filtering out genes with small counts and samples. However, this may inadvertently remove important genes that have truly large allelic imbalances. Another option is to use pseudocounts or Bayesian estimators to reduce the variance. To this end, we evaluated the accuracy of four different estimators, the latter two of which are Bayesian shrinkage estimators: maximum likelihood, adding a pseudocount to each allele, approximate posterior estimation of GLM coefficients (apeglm) and adaptive shrinkage (ash). We also wrote C++ code to quickly calculate ML and apeglm estimates and integrated it into the apeglm package. The four methods were evaluated on two simulations and one real data set. Apeglm consistently performed better than ML according to a variety of criteria, and generally outperformed use of pseudocounts as well. Ash also performed better than ML in one of the simulations, but in the other performance was more mixed. Finally, when compared to five other packages that also fit beta-binomial models, the apeglm package was substantially faster and more numerically reliable, making our package useful for quick and reliable analyses of allelic imbalance. Apeglm is available as an R/Bioconductor package at http://bioconductor.org/packages/apeglm.
Collapse
Affiliation(s)
- Joshua P. Zitovsky
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514, USA
| |
Collapse
|
53
|
da Silva Francisco Junior R, Dos Santos Ferreira C, Santos E Silva JC, Terra Machado D, Côrtes Martins Y, Ramos V, Simões Carnivali G, Garcia AB, Medina-Acosta E. Pervasive Inter-Individual Variation in Allele-Specific Expression in Monozygotic Twins. Front Genet 2019; 10:1178. [PMID: 31850058 PMCID: PMC6887657 DOI: 10.3389/fgene.2019.01178] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 10/24/2019] [Indexed: 01/19/2023] Open
Abstract
Despite being developed from one zygote, heterokaryotypic monozygotic (MZ) co-twins exhibit discordant karyotypes. Epigenomic studies in biological samples from heterokaryotypic MZ co-twins are of the most significant value for assessing the effects on gene- and allele-specific expression of an extranumerary chromosomal copy or structural chromosomal disparities in otherwise nearly identical germline genetic contributions. Here, we use RNA-Seq data from existing repositories to establish within-pair correlations for the breadth and magnitude of allele-specific expression (ASE) in heterokaryotypic MZ co-twins discordant for trisomy 21 and maternal 21q inheritance, as well as homokaryotypic co-twins. We show that there is a genome-wide disparity at ASE sites between the heterokaryotypic MZ co-twins. Although most of the disparity corresponds to changes in the magnitude of biallelic imbalance, ASE sites switching from either strictly monoallelic to biallelic imbalance or the reverse occur in few genes that are known or predicted to be imprinted, subject to X-chromosome inactivation or A-to-I(G) RNA edited. We also uncovered comparable ASE differences between homokaryotypic MZ twins. The extent of ASE discordance in MZ twins (2.7%) was about 10-fold lower than the expected between pairs of unrelated, non-twin males or females. The results indicate that the observed within-pair dissimilarities in breadth and magnitude of ASE sites in the heterokaryotypic MZ co-twins could not solely be attributable to the aneuploidy and the missing allelic heritability at 21q.
Collapse
Affiliation(s)
| | - Cristina Dos Santos Ferreira
- Laboratório de Biotecnologia, Núcleo de Diagnóstico e Investigação Molecular, Universidade Estadual do Norte Fluminense, Campos dos Goytacazes, Brazil
| | - Juan Carlo Santos E Silva
- Laboratório de Biotecnologia, Núcleo de Diagnóstico e Investigação Molecular, Universidade Estadual do Norte Fluminense, Campos dos Goytacazes, Brazil
| | - Douglas Terra Machado
- Laboratório de Biotecnologia, Núcleo de Diagnóstico e Investigação Molecular, Universidade Estadual do Norte Fluminense, Campos dos Goytacazes, Brazil
| | - Yasmmin Côrtes Martins
- Laboratório de Bioinformática, Laboratório Nacional de Computação Científica, Petrópolis, Brazil
| | - Victor Ramos
- Department of Genetics, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, Brazil
| | - Gustavo Simões Carnivali
- Department of Computational Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Ana Beatriz Garcia
- Laboratório de Biotecnologia, Núcleo de Diagnóstico e Investigação Molecular, Universidade Estadual do Norte Fluminense, Campos dos Goytacazes, Brazil
| | - Enrique Medina-Acosta
- Laboratório de Biotecnologia, Núcleo de Diagnóstico e Investigação Molecular, Universidade Estadual do Norte Fluminense, Campos dos Goytacazes, Brazil
| |
Collapse
|
54
|
Choi K, Raghupathy N, Churchill GA. A Bayesian mixture model for the analysis of allelic expression in single cells. Nat Commun 2019; 10:5188. [PMID: 31729374 PMCID: PMC6858378 DOI: 10.1038/s41467-019-13099-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 10/09/2019] [Indexed: 11/09/2022] Open
Abstract
Allele-specific expression (ASE) at single-cell resolution is a critical tool for understanding the stochastic and dynamic features of gene expression. However, low read coverage and high biological variability present challenges for analyzing ASE. We demonstrate that discarding multi-mapping reads leads to higher variability in estimates of allelic proportions, an increased frequency of sampling zeros, and can lead to spurious findings of dynamic and monoallelic gene expression. Here, we report a method for ASE analysis from single-cell RNA-Seq data that accurately classifies allelic expression states and improves estimation of allelic proportions by pooling information across cells. We further demonstrate that combining information across cells using a hierarchical mixture model reduces sampling variability without sacrificing cell-to-cell heterogeneity. We applied our approach to re-evaluate the statistical independence of allelic bursting and track changes in the allele-specific expression patterns of cells sampled over a developmental time course.
Collapse
Affiliation(s)
- Kwangbom Choi
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA
| | | | - Gary A Churchill
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA.
| |
Collapse
|
55
|
Salavati M, Bush SJ, Palma-Vera S, McCulloch MEB, Hume DA, Clark EL. Elimination of Reference Mapping Bias Reveals Robust Immune Related Allele-Specific Expression in Crossbred Sheep. Front Genet 2019; 10:863. [PMID: 31608110 PMCID: PMC6761296 DOI: 10.3389/fgene.2019.00863] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 08/19/2019] [Indexed: 12/13/2022] Open
Abstract
Pervasive allelic variation at both gene and single nucleotide level (SNV) between individuals is commonly associated with complex traits in humans and animals. Allele-specific expression (ASE) analysis, using RNA-Seq, can provide a detailed annotation of allelic imbalance and infer the existence of cis-acting transcriptional regulation. However, variant detection in RNA-Seq data is compromised by biased mapping of reads to the reference DNA sequence. In this manuscript, we describe an unbiased standardized computational pipeline for allele-specific expression analysis using RNA-Seq data, which we have adapted and developed using tools available under open license. The analysis pipeline we present is designed to minimize reference bias while providing accurate profiling of allele-specific expression across tissues and cell types. Using this methodology, we were able to profile pervasive allelic imbalance across tissues and cell types, at both the gene and SNV level, in Texel×Scottish Blackface sheep, using the sheep gene expression atlas data set. ASE profiles were pervasive in each sheep and across all tissue types investigated. However, ASE profiles shared across tissues were limited, and instead, they tended to be highly tissue-specific. These tissue-specific ASE profiles may underlie the expression of economically important traits and could be utilized as weighted SNVs, for example, to improve the accuracy of genomic selection in breeding programs for sheep. An additional benefit of the pipeline is that it does not require parental genotypes and can therefore be applied to other RNA-Seq data sets for livestock, including those available on the Functional Annotation of Animal Genomes (FAANG) data portal. This study is the first global characterization of moderate to extreme ASE in tissues and cell types from sheep. We have applied a robust methodology for ASE profiling to provide both a novel analysis of the multi-dimensional sheep gene expression atlas data set and a foundation for identifying the regulatory and expressed elements of the genome that are driving complex traits in livestock.
Collapse
Affiliation(s)
- Mazdak Salavati
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Edinburgh, United Kingdom
| | - Stephen J. Bush
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Edinburgh, United Kingdom
| | - Sergio Palma-Vera
- Leibniz Institute for Farm Animal Biology (FBN), Institute for Reproductive Biology, Dummerstorf, Germany
| | - Mary E. B. McCulloch
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Edinburgh, United Kingdom
| | - David A. Hume
- Mater Research Institute-University of Queensland, Translational Research Institute, Woolloongabba, QLD, Australia
| | - Emily L. Clark
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Edinburgh, United Kingdom
| |
Collapse
|
56
|
Slabaugh E, Desai JS, Sartor RC, Lawas LMF, Jagadish SVK, Doherty CJ. Analysis of differential gene expression and alternative splicing is significantly influenced by choice of reference genome. RNA (NEW YORK, N.Y.) 2019; 25:669-684. [PMID: 30872414 PMCID: PMC6521602 DOI: 10.1261/rna.070227.118] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Accepted: 03/06/2019] [Indexed: 05/19/2023]
Abstract
RNA-seq analysis has enabled the evaluation of transcriptional changes in many species including nonmodel organisms. However, in most species only a single reference genome is available and RNA-seq reads from highly divergent varieties are typically aligned to this reference. Here, we quantify the impacts of the choice of mapping genome in rice where three high-quality reference genomes are available. We aligned RNA-seq data from a popular productive rice variety to three different reference genomes and found that the identification of differentially expressed genes differed depending on which reference genome was used for mapping. Furthermore, the ability to detect differentially used transcript isoforms was profoundly affected by the choice of reference genome: Only 30% of the differentially used splicing features were detected when reads were mapped to the more commonly used, but more distantly related reference genome. This demonstrated that gene expression and splicing analysis varies considerably depending on the mapping reference genome, and that analysis of individuals that are distantly related to an available reference genome may be improved by acquisition of new genomic reference material. We observed that these differences in transcriptome analysis are, in part, due to the presence of single nucleotide polymorphisms between the sequenced individual and each respective reference genome, as well as annotation differences between the reference genomes that exist even between syntenic orthologs. We conclude that even between two closely related genomes of similar quality, using the reference genome that is most closely related to the species being sampled significantly improves transcriptome analysis.
Collapse
Affiliation(s)
- Erin Slabaugh
- Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695, USA
| | - Jigar S Desai
- Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695, USA
| | - Ryan C Sartor
- Crop and Soil Science Department, North Carolina State University, Raleigh, North Carolina 27695, USA
| | - Lovely Mae F Lawas
- International Rice Research Institute (IRRI), DAPO Box 7777, Metro Manila, Philippines
- Max Planck Institute of Molecular Plant Physiology, D-14476, Potsdam, Germany
| | - S V Krishna Jagadish
- International Rice Research Institute (IRRI), DAPO Box 7777, Metro Manila, Philippines
- Department of Agronomy, Kansas State University, Manhattan, Kansas 66506, USA
| | - Colleen J Doherty
- Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695, USA
| |
Collapse
|
57
|
Neuner SM, Heuer SE, Zhang JG, Philip VM, Kaczorowski CC. Identification of Pre-symptomatic Gene Signatures That Predict Resilience to Cognitive Decline in the Genetically Diverse AD-BXD Model. Front Genet 2019; 10:35. [PMID: 30787942 PMCID: PMC6372563 DOI: 10.3389/fgene.2019.00035] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 01/18/2019] [Indexed: 12/23/2022] Open
Abstract
Across the population, individuals exhibit a wide variation of susceptibility or resilience to developing Alzheimer’s disease (AD). Identifying specific factors that promote resilience would provide insight into disease mechanisms and nominate potential targets for therapeutic intervention. Here, we use transcriptome profiling to identify gene networks present in the pre-symptomatic AD mouse brain relating to neuroinflammation, brain vasculature, extracellular matrix organization, and synaptic signaling that predict cognitive performance at an advanced age. We highlight putative drivers of these observed relationships, including Itgb2, Fcgr2b, Slc6a14, and Gper1, which represent prime targets through which to promote resilience prior to overt symptom onset. In addition, we identify a genomic region on chromosome 2 containing variants that directly modulate resilience network expression. Overall, work here highlights new potential drivers of resilience to AD and contributes significantly to our understanding of early, potentially causal, disease mechanisms.
Collapse
Affiliation(s)
- Sarah M Neuner
- University of Tennessee Health Science Center, Memphis, TN, United States.,The Jackson Laboratory, Bar Harbor, ME, United States
| | - Sarah E Heuer
- The Jackson Laboratory, Bar Harbor, ME, United States.,Tufts University Sackler School of Graduate Biomedical Sciences, Boston, MA, United States
| | - Ji-Gang Zhang
- The Jackson Laboratory, Bar Harbor, ME, United States
| | | | | |
Collapse
|
58
|
Neuner SM, Heuer SE, Huentelman MJ, O'Connell KMS, Kaczorowski CC. Harnessing Genetic Complexity to Enhance Translatability of Alzheimer's Disease Mouse Models: A Path toward Precision Medicine. Neuron 2018; 101:399-411.e5. [PMID: 30595332 DOI: 10.1016/j.neuron.2018.11.040] [Citation(s) in RCA: 144] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 10/02/2018] [Accepted: 11/20/2018] [Indexed: 01/15/2023]
Abstract
An individual's genetic makeup plays a large role in determining susceptibility to Alzheimer's disease (AD) but has largely been ignored in preclinical studies. To test the hypothesis that incorporating genetic diversity into mouse models of AD would improve translational potential, we combined a well-established mouse model of AD with a genetically diverse reference panel to generate mice that harbor identical high-risk human mutations but differ across the remainder of their genome. We first show that genetic variation profoundly modifies the impact of human AD mutations on both cognitive and pathological phenotypes. We then validate this complex AD model by demonstrating high degrees of genetic, transcriptomic, and phenotypic overlap with human AD. Overall, work here both introduces a novel AD mouse population as an innovative and reproducible resource for the study of mechanisms underlying AD and provides evidence that preclinical models incorporating genetic diversity may better translate to human disease.
Collapse
Affiliation(s)
- Sarah M Neuner
- The Neuroscience Institute, University of Tennessee Health Science Center, Memphis, TN 38163, USA; The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Sarah E Heuer
- The Jackson Laboratory, Bar Harbor, ME 04609, USA; Sackler School of Graduate Biomedical Sciences, Tufts University, Boston, MA 02111, USA
| | - Matthew J Huentelman
- Neurogenomics Division, Translational Genomics Research Institute, Phoenix, AZ 85004, USA
| | | | | |
Collapse
|
59
|
Abstract
The majority of gene loci that have been associated with type 2 diabetes play a role in pancreatic islet function. To evaluate the role of islet gene expression in the etiology of diabetes, we sensitized a genetically diverse mouse population with a Western diet high in fat (45% kcal) and sucrose (34%) and carried out genome-wide association mapping of diabetes-related phenotypes. We quantified mRNA abundance in the islets and identified 18,820 expression QTL. We applied mediation analysis to identify candidate causal driver genes at loci that affect the abundance of numerous transcripts. These include two genes previously associated with monogenic diabetes (PDX1 and HNF4A), as well as three genes with nominal association with diabetes-related traits in humans (FAM83E, IL6ST, and SAT2). We grouped transcripts into gene modules and mapped regulatory loci for modules enriched with transcripts specific for α-cells, and another specific for δ-cells. However, no single module enriched for β-cell-specific transcripts, suggesting heterogeneity of gene expression patterns within the β-cell population. A module enriched in transcripts associated with branched-chain amino acid metabolism was the most strongly correlated with physiological traits that reflect insulin resistance. Although the mice in this study were not overtly diabetic, the analysis of pancreatic islet gene expression under dietary-induced stress enabled us to identify correlated variation in groups of genes that are functionally linked to diabetes-associated physiological traits. Our analysis suggests an expected degree of concordance between diabetes-associated loci in the mouse and those found in human populations, and demonstrates how the mouse can provide evidence to support nominal associations found in human genome-wide association mapping.
Collapse
|