1
|
Buyan A, Meshcheryakov G, Safronov V, Abramov S, Boytsov A, Nozdrin V, Baulin EF, Kolmykov S, Vierstra J, Kolpakov F, Makeev VJ, Kulakovskiy IV. Statistical framework for calling allelic imbalance in high-throughput sequencing data. Nat Commun 2025; 16:1739. [PMID: 39966391 PMCID: PMC11836314 DOI: 10.1038/s41467-024-55513-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 12/16/2024] [Indexed: 02/20/2025] Open
Abstract
High-throughput sequencing facilitates large-scale studies of gene regulation and allows tracing the associations of individual genomic variants with changes in gene regulation and expression. Compared to classic association studies, the assessment of an allelic imbalance at heterozygous variants captures functional variant effects with smaller sample sizes, higher sensitivity, and better resolution. Yet, identification of allele-specific variants from allelic read counts remains challenging due to data-dependent biases and overdispersion arising from technical and biological variability. We present MIXALIME, a novel computational framework for calling allele-specific variants in diverse omics data with a repertoire of statistical models accounting for read mapping bias and copy number variation. We benchmark MIXALIME with DNase-Seq, ATAC-Seq, and CAGE-Seq data, and we demonstrate that the allelic imbalance highlights causal variants in GWAS results. Finally, as a showcase of the large-scale practical application of MIXALIME, we present an atlas of variants exhibiting allele-specific chromatin accessibility, built from thousands of available datasets obtained from diverse cell types.
Collapse
Affiliation(s)
- Andrey Buyan
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
- Life Improvement by Future Technologies (LIFT) Center, Moscow, Russia
| | | | - Viacheslav Safronov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
- Moscow Center for Advanced Studies, Moscow, Russia
| | - Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
- Moscow Center for Advanced Studies, Moscow, Russia
| | - Vladimir Nozdrin
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Eugene F Baulin
- Moscow Center for Advanced Studies, Moscow, Russia
- International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland
| | - Semyon Kolmykov
- Department of Computational Biology, Sirius University of Science and Technology, Sirius, Krasnodar region, Russia
| | - Jeff Vierstra
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Fedor Kolpakov
- Department of Computational Biology, Sirius University of Science and Technology, Sirius, Krasnodar region, Russia
- Bioinformatics Laboratory, Federal Research Center for Information and Computational Technologies, Novosibirsk, Russia
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.
- Moscow Center for Advanced Studies, Moscow, Russia.
- Institute of Biochemistry and Genetics, Ufa Federal Research Centre of the Russian Academy of Sciences, Ufa, Russia.
- Cancer Research UK National Biomarker Centre, University of Manchester, Manchester, UK.
| | - Ivan V Kulakovskiy
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia.
- Life Improvement by Future Technologies (LIFT) Center, Moscow, Russia.
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.
| |
Collapse
|
2
|
Mishra P, Barrera TS, Grieshop K, Agrawal AF. Cis-regulatory Variation in Relation to Sex and Sexual Dimorphism in Drosophila melanogaster. Genome Biol Evol 2024; 16:evae234. [PMID: 39613311 DOI: 10.1093/gbe/evae234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/11/2024] [Indexed: 12/01/2024] Open
Abstract
Much of sexual dimorphism is likely due to sex-biased gene expression, which results from differential regulation of a genome that is largely shared between males and females. Here, we use allele-specific expression to explore cis-regulatory variation in Drosophila melanogaster in relation to sex. We develop a Bayesian framework to infer the transcriptome-wide joint distribution of cis-regulatory effects across the sexes. We also examine patterns of cis-regulatory variation with respect to two other levels of variation in sexual dimorphism: (i) across genes that vary in their degree of sex-biased expression and (ii) among tissues that vary in their degree of dimorphism (e.g. relatively low dimorphism in heads vs. high dimorphism in gonads). We uncover evidence of widespread cis-regulatory variation in all tissues examined, with female-biased genes being especially enriched for this variation. A sizeable proportion of cis-regulatory variation is inferred to have sex-specific effects, with sex-dependent cis effects being much more frequent in gonads than in heads. Finally, we find some genes where 1 allele contributes to more than 50% of a gene's expression in heterozygous males but <50% of its expression in heterozygous females. Such variants could provide a mechanism for sex-specific dominance reversals, a phenomenon important for sexually antagonistic balancing selection. However, tissue differences in allelic imbalance are approximately as frequent as sex differences, perhaps suggesting that sexual conflict may not be particularly unique in shaping patterns of expression variation.
Collapse
Affiliation(s)
- Prashastha Mishra
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada M5S 3B2
| | - Tania S Barrera
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada M5S 3B2
| | - Karl Grieshop
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada M5S 3B2
- Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm SE-10691, Sweden
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich NR4 7TJ, UK
| | - Aneil F Agrawal
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada M5S 3B2
| |
Collapse
|
3
|
Dyer NA, Lucas ER, Nagi SC, McDermott DP, Brenas JH, Miles A, Clarkson CS, Mawejje HD, Wilding CS, Halfon MS, Asma H, Heinz E, Donnelly MJ. Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele-specific expression. Proc Biol Sci 2024; 291:20241142. [PMID: 39288798 PMCID: PMC11407855 DOI: 10.1098/rspb.2024.1142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 07/05/2024] [Accepted: 07/24/2024] [Indexed: 09/19/2024] Open
Abstract
Malaria control relies on insecticides targeting the mosquito vector, but this is increasingly compromised by insecticide resistance, which can be achieved by elevated expression of detoxifying enzymes that metabolize the insecticide. In diploid organisms, gene expression is regulated both in cis, by regulatory sequences on the same chromosome, and by trans acting factors, affecting both alleles equally. Differing levels of transcription can be caused by mutations in cis-regulatory modules (CRM), but few of these have been identified in mosquitoes. We crossed bendiocarb-resistant and susceptible Anopheles gambiae strains to identify cis-regulated genes that might be responsible for the resistant phenotype using RNAseq, and CRM sequences controlling gene expression in insecticide resistance relevant tissues were predicted using machine learning. We found 115 genes showing allele-specific expression (ASE) in hybrids of insecticide susceptible and resistant strains, suggesting cis-regulation is an important mechanism of gene expression regulation in A. gambiae. The genes showing ASE included a higher proportion of Anopheles-specific genes on average younger than genes with balanced allelic expression.
Collapse
Affiliation(s)
- Naomi A. Dyer
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
| | - Eric R. Lucas
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
| | - Sanjay C. Nagi
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
| | - Daniel P. McDermott
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
| | - Jon H. Brenas
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CambridgeCB10 1SA, UK
| | - Alistair Miles
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CambridgeCB10 1SA, UK
| | - Chris S. Clarkson
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CambridgeCB10 1SA, UK
| | - Henry D. Mawejje
- Infectious Diseases Research Collaboration (IDRC), Plot 2C Nakasero Hill Road, PO Box 7475, Kampala, Uganda
| | - Craig S. Wilding
- School of Biological and Environmental Sciences, Liverpool John Moores University, Byrom Street, LiverpoolL3 3AF, UK
| | - Marc S. Halfon
- Department of Biochemistry, Jacobs School of Medicine & Biomedical Sciences, University at Buffalo-State University of New York, 955 Main Street, Buffalo, NY14203, USA
| | - Hasiba Asma
- Department of Biochemistry, Jacobs School of Medicine & Biomedical Sciences, University at Buffalo-State University of New York, 955 Main Street, Buffalo, NY14203, USA
| | - Eva Heinz
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
- Strathclyde Institute of Pharmacy & Biomedical Sciences, University of Strathclyde, GlasgowG4 0RE, UK
- Department of Clinical Sciences, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
| | - Martin J. Donnelly
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, LiverpoolL3 5QA, UK
| |
Collapse
|
4
|
Nanni AV, Martinez N, Graze R, Morse A, Newman JRB, Jain V, Vlaho S, Signor S, Nuzhdin SV, Renne R, McIntyre LM. Sex-Biased Expression Is Associated With Chromatin State in Drosophila melanogaster and Drosophila simulans. Mol Biol Evol 2023; 40:msad078. [PMID: 37116218 PMCID: PMC10162771 DOI: 10.1093/molbev/msad078] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 02/24/2023] [Accepted: 03/13/2023] [Indexed: 04/30/2023] Open
Abstract
In Drosophila melanogaster and D. simulans head tissue, 60% of orthologous genes show evidence of sex-biased expression in at least one species. Of these, ∼39% (2,192) are conserved in direction. We hypothesize enrichment of open chromatin in the sex where we see expression bias and closed chromatin in the opposite sex. Male-biased orthologs are significantly enriched for H3K4me3 marks in males of both species (∼89% of male-biased orthologs vs. ∼76% of unbiased orthologs). Similarly, female-biased orthologs are significantly enriched for H3K4me3 marks in females of both species (∼90% of female-biased orthologs vs. ∼73% of unbiased orthologs). The sex-bias ratio in female-biased orthologs was similar in magnitude between the two species, regardless of the closed chromatin (H3K27me2me3) marks in males. However, in male-biased orthologs, the presence of H3K27me2me3 in both species significantly reduced the correlation between D. melanogaster sex-bias ratio and the D. simulans sex-bias ratio. Male-biased orthologs are enriched for evidence of positive selection in the D. melanogaster group. There are more male-biased genes than female-biased genes in both species. For orthologs with gains/losses of sex-bias between the two species, there is an excess of male-bias compared to female-bias, but there is no consistent pattern in the relationship between H3K4me3 or H3K27me2me3 chromatin marks and expression. These data suggest chromatin state is a component of the maintenance of sex-biased expression and divergence of sex-bias between species is reflected in the complexity of the chromatin status.
Collapse
Affiliation(s)
- Adalena V Nanni
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL
- University of Florida Genetics Institute, University of Florida, Gainesville, FL
| | - Natalie Martinez
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL
| | - Rita Graze
- Department of Biological Sciences, Auburn University, Auburn, AL
| | - Alison Morse
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL
- University of Florida Genetics Institute, University of Florida, Gainesville, FL
| | - Jeremy R B Newman
- University of Florida Genetics Institute, University of Florida, Gainesville, FL
| | - Vaibhav Jain
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL
| | - Srna Vlaho
- Department of Biological Sciences, University of Southern California, Los Angeles, CA
| | - Sarah Signor
- Department of Biological Sciences, North Dakota State University, Fargo, ND
| | - Sergey V Nuzhdin
- Department of Biological Sciences, University of Southern California, Los Angeles, CA
| | - Rolf Renne
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL
- University of Florida Genetics Institute, University of Florida, Gainesville, FL
| | - Lauren M McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL
- University of Florida Genetics Institute, University of Florida, Gainesville, FL
| |
Collapse
|
5
|
Boatwright JL. A Robust Methodology for Assessing Homoeolog-Specific Expression. Methods Mol Biol 2023; 2545:251-258. [PMID: 36720817 DOI: 10.1007/978-1-0716-2561-3_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Angiosperm evolution is marked by numerous, recurring polyploidization events. While hybridization and polyploidization have greatly increased the degree of genetic and phenotypic diversity in plants, the mechanisms underlying changes in the genotype-to-phenotype relationships remain unclear. As the field of natural sciences continues to expand during the post-genomic era, large datasets are becoming increasingly common. However, the development of tools and workflows available to robustly assess these changes have lagged behind data production. A robust homoeolog-specific expression analysis strongly depends upon proper homoeolog calling, the ability to account for reference sequence biases, flexible and accurate methods for dealing with residual bias, and a reproducible workflow. To that end, this chapter aims to provide a detailed description of the potential pitfalls encountered while estimating homoeolog-specific expression as well as provide a workflow that allows for robust inferences based on precise estimates of expression changes.
Collapse
Affiliation(s)
- J Lucas Boatwright
- Advanced Plant Technology, Clemson University, Clemson, SC, USA. .,Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, USA.
| |
Collapse
|
6
|
Nanni AV, Martinez N, Graze R, Morse A, Newman JRB, Jain V, Vlaho S, Signor S, Nuzhdin SV, Renne R, McIntyre LM. Sex-biased expression is associated with chromatin state in D. melanogaster and D. simulans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.13.523946. [PMID: 36711631 PMCID: PMC9882225 DOI: 10.1101/2023.01.13.523946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
We propose a new model for the association of chromatin state and sex-bias in expression. We hypothesize enrichment of open chromatin in the sex where we see expression bias (OS) and closed chromatin in the opposite sex (CO). In this study of D. melanogaster and D. simulans head tissue, sex-bias in expression is associated with H3K4me3 (open mark) in males for male-biased genes and in females for female-biased genes in both species. Sex-bias in expression is also largely conserved in direction and magnitude between the two species on the X and autosomes. In male-biased orthologs, the sex-bias ratio is more divergent between species if both species have H3K27me2me3 marks in females compared to when either or neither species has H3K27me2me3 in females. H3K27me2me3 marks in females are associated with male-bias in expression on the autosomes in both species, but on the X only in D. melanogaster . In female-biased orthologs the relationship between the species for the sex-bias ratio is similar regardless of the H3K27me2me3 marks in males. Female-biased orthologs are more similar in the ratio of sex-bias than male-biased orthologs and there is an excess of male-bias in expression in orthologs that gain/lose sex-bias. There is an excess of male-bias in sex-limited expression in both species suggesting excess male-bias is due to rapid evolution between the species. The X chromosome has an enrichment in male-limited H3K4me3 in both species and an enrichment of sex-bias in expression compared to the autosomes.
Collapse
Affiliation(s)
- Adalena V Nanni
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL
- University of Florida Genetics Institute, University of Florida, Gainesville, FL, USA
| | - Natalie Martinez
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL
| | - Rita Graze
- Department of Biological Sciences, Auburn University, Auburn, AL, USA
| | - Alison Morse
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL
- University of Florida Genetics Institute, University of Florida, Gainesville, FL, USA
| | - Jeremy R B Newman
- University of Florida Genetics Institute, University of Florida, Gainesville, FL, USA
| | - Vaibhav Jain
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL
| | - Srna Vlaho
- Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Sarah Signor
- Department of Biological Sciences, North Dakota State University, Fargo, ND, USA
| | - Sergey V Nuzhdin
- Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Rolf Renne
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL
- University of Florida Genetics Institute, University of Florida, Gainesville, FL, USA
| | - Lauren M McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL
- University of Florida Genetics Institute, University of Florida, Gainesville, FL, USA
| |
Collapse
|
7
|
Genome-wide chromatin accessibility analysis unveils open chromatin convergent evolution during polyploidization in cotton. Proc Natl Acad Sci U S A 2022; 119:e2209743119. [PMID: 36279429 PMCID: PMC9636936 DOI: 10.1073/pnas.2209743119] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Allopolyploidization, resulting in divergent genomes in the same cell, is believed to trigger a “genome shock”, leading to broad genetic and epigenetic changes. However, little is understood about chromatin and gene-expression dynamics as underlying driving forces during allopolyploidization. Here, we examined the genome-wide DNase I-hypersensitive site (DHS) and its variations in domesticated allotetraploid cotton (
Gossypium hirsutum
and
Gossypium barbadense
, AADD) and its extant AA (
Gossypium arboreum
) and DD (
Gossypium raimondii
) progenitors. We observed distinct DHS distributions between
G. arboreum
and
G. raimondii
. In contrast, the DHSs of the two subgenomes of
G. hirsutum
and
G. barbadense
showed a convergent distribution. This convergent distribution of DHS was also present in the wild allotetraploids
Gossypium darwinii
and
G. hirsutum
var.
yucatanense
, but absent from a resynthesized hybrid of
G. arboreum
and
G. raimondii
, suggesting that it may be a common feature in polyploids, and not a consequence of domestication after polyploidization. We revealed that putative
cis
-regulatory elements (CREs) derived from polyploidization-related DHSs were dominated by several families, including Dof, ERF48, and BPC1. Strikingly, 56.6% of polyploidization-related DHSs were derived from transposable elements (TEs). Moreover, we observed positive correlations between DHS accessibility and the histone marks H3K4me3, H3K27me3, H3K36me3, H3K27ac, and H3K9ac, indicating that coordinated interplay among histone modifications, TEs, and CREs drives the DHS landscape dynamics under polyploidization. Collectively, these findings advance our understanding of the regulatory architecture in plants and underscore the complexity of regulome evolution during polyploidization.
Collapse
|
8
|
Sherbina K, León-Novelo LG, Nuzhdin SV, McIntyre LM, Marroni F. Power calculator for detecting allelic imbalance using hierarchical Bayesian model. BMC Res Notes 2021; 14:436. [PMID: 34838135 PMCID: PMC8626927 DOI: 10.1186/s13104-021-05851-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 11/15/2021] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Allelic imbalance (AI) is the differential expression of the two alleles in a diploid. AI can vary between tissues, treatments, and environments. Methods for testing AI exist, but methods are needed to estimate type I error and power for detecting AI and difference of AI between conditions. As the costs of the technology plummet, what is more important: reads or replicates? RESULTS We find that a minimum of 2400, 480, and 240 allele specific reads divided equally among 12, 5, and 3 replicates is needed to detect a 10, 20, and 30%, respectively, deviation from allelic balance in a condition with power > 80%. A minimum of 960 and 240 allele specific reads divided equally among 8 replicates is needed to detect a 20 or 30% difference in AI between conditions with comparable power. Higher numbers of replicates increase power more than adding coverage without affecting type I error. We provide a Python package that enables simulation of AI scenarios and enables individuals to estimate type I error and power in detecting AI and differences in AI between conditions.
Collapse
Affiliation(s)
- Katrina Sherbina
- Quantitative and Computational Biology Section, University of Southern California, Los Angeles, CA, 90046, USA
| | - Luis G León-Novelo
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston-School of Public Health, Houston, TX, 77030, USA
| | - Sergey V Nuzhdin
- Molecular and Computational Biology Section, University of Southern California, Los Angeles, CA, 90046, USA
| | - Lauren M McIntyre
- Genetics Institute and Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32603, USA
| | - Fabio Marroni
- Dipartimento di Scienze Agroalimentari, Ambientali e Animali, Università di Udine, 33100, Udine, Italy.
| |
Collapse
|
9
|
Boatwright JL, Yeh CT, Hu HC, Susanna A, Soltis DE, Soltis PS, Schnable PS, Barbazuk WB. Trajectories of Homoeolog-Specific Expression in Allotetraploid Tragopogon castellanus Populations of Independent Origins. FRONTIERS IN PLANT SCIENCE 2021; 12:679047. [PMID: 34249049 PMCID: PMC8261302 DOI: 10.3389/fpls.2021.679047] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 05/20/2021] [Indexed: 06/13/2023]
Abstract
Polyploidization can have a significant ecological and evolutionary impact by providing substantially more genetic material that may result in novel phenotypes upon which selection may act. While the effects of polyploidization are broadly reviewed across the plant tree of life, the reproducibility of these effects within naturally occurring, independently formed polyploids is poorly characterized. The flowering plant genus Tragopogon (Asteraceae) offers a rare glimpse into the intricacies of repeated allopolyploid formation with both nascent (< 90 years old) and more ancient (mesopolyploids) formations. Neo- and mesopolyploids in Tragopogon have formed repeatedly and have extant diploid progenitors that facilitate the comparison of genome evolution after polyploidization across a broad span of evolutionary time. Here, we examine four independently formed lineages of the mesopolyploid Tragopogon castellanus for homoeolog expression changes and fractionation after polyploidization. We show that expression changes are remarkably similar among these independently formed polyploid populations with large convergence among expressed loci, moderate convergence among loci lost, and stochastic silencing. We further compare and contrast these results for T. castellanus with two nascent Tragopogon allopolyploids. While homoeolog expression bias was balanced in both nascent polyploids and T. castellanus, the degree of additive expression was significantly different, with the mesopolyploid populations demonstrating more non-additive expression. We suggest that gene dosage and expression noise minimization may play a prominent role in regulating gene expression patterns immediately after allopolyploidization as well as deeper into time, and these patterns are conserved across independent polyploid lineages.
Collapse
Affiliation(s)
- J. Lucas Boatwright
- Advanced Plant Technology Program, Clemson University, Clemson, SC, United States
| | - Cheng-Ting Yeh
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Heng-Cheng Hu
- Department of Agronomy, Iowa State University, Ames, IA, United States
- Covance Inc., Indianapolis, IN, United States
| | - Alfonso Susanna
- Botanic Institute of Barcelona, Consejo Superior de Investigaciones Científicas, ICUB, Barcelona, Spain
| | - Douglas E. Soltis
- Department of Biology, University of Florida, Gainesville, FL, United States
- Plant Molecular and Cellular Biology Program, University of Florida, Gainesville, FL, United States
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
- Genetics Institute, University of Florida, Gainesville, FL, United States
- Biodiversity Institute, University of Florida, Gainesville, FL, United States
| | - Pamela S. Soltis
- Plant Molecular and Cellular Biology Program, University of Florida, Gainesville, FL, United States
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
- Genetics Institute, University of Florida, Gainesville, FL, United States
- Biodiversity Institute, University of Florida, Gainesville, FL, United States
| | | | - William B. Barbazuk
- Department of Biology, University of Florida, Gainesville, FL, United States
| |
Collapse
|
10
|
Miller BR, Morse AM, Borgert JE, Liu Z, Sinclair K, Gamble G, Zou F, Newman JRB, León-Novelo LG, Marroni F, McIntyre LM. Testcrosses are an efficient strategy for identifying cis-regulatory variation: Bayesian analysis of allele-specific expression (BayesASE). G3 (BETHESDA, MD.) 2021; 11:jkab096. [PMID: 33772539 PMCID: PMC8104932 DOI: 10.1093/g3journal/jkab096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 03/10/2021] [Indexed: 12/30/2022]
Abstract
Allelic imbalance (AI) occurs when alleles in a diploid individual are differentially expressed and indicates cis acting regulatory variation. What is the distribution of allelic effects in a natural population? Are all alleles the same? Are all alleles distinct? The approach described applies to any technology generating allele-specific sequence counts, for example for chromatin accessibility and can be applied generally including to comparisons between tissues or environments for the same genotype. Tests of allelic effect are generally performed by crossing individuals and comparing expression between alleles directly in the F1. However, a crossing scheme that compares alleles pairwise is a prohibitive cost for more than a handful of alleles as the number of crosses is at least (n2-n)/2 where n is the number of alleles. We show here that a testcross design followed by a hypothesis test of AI between testcrosses can be used to infer differences between nontester alleles, allowing n alleles to be compared with n crosses. Using a mouse data set where both testcrosses and direct comparisons have been performed, we show that the predicted differences between nontester alleles are validated at levels of over 90% when a parent-of-origin effect is present and of 60%-80% overall. Power considerations for a testcross, are similar to those in a reciprocal cross. In all applications, the testing for AI involves several complex bioinformatics steps. BayesASE is a complete bioinformatics pipeline that incorporates state-of-the-art error reduction techniques and a flexible Bayesian approach to estimating AI and formally comparing levels of AI between conditions. The modular structure of BayesASE has been packaged in Galaxy, made available in Nextflow and as a collection of scripts for the SLURM workload manager on github (https://github.com/McIntyre-Lab/BayesASE).
Collapse
Affiliation(s)
- Brecca R Miller
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- NYU Langone Health, New York University, New York, NY 10013, USA
| | - Alison M Morse
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| | - Jacqueline E Borgert
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27515, USA
| | - Zihao Liu
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| | - Kelsey Sinclair
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| | - Gavin Gamble
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27515, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27515, USA
| | - Jeremy R B Newman
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Pathology, University of Florida, Gainesville, FL 32608 USA
| | - Luis G León-Novelo
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston-University of Texas School of Public Health, Houston, TX 7703, USA
| | - Fabio Marroni
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, Udine, 33100, Italy
| | - Lauren M McIntyre
- Genetics Institute, University of Florida, Gainesville, FL 32608, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32608, USA
| |
Collapse
|
11
|
Deitz KC, Takken W, Slotman MA. The Genetic Architecture of Post-Zygotic Reproductive Isolation Between Anopheles coluzzii and An. quadriannulatus. Front Genet 2020; 11:925. [PMID: 33005168 PMCID: PMC7480394 DOI: 10.3389/fgene.2020.00925] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 07/24/2020] [Indexed: 11/15/2022] Open
Abstract
The Anopheles gambiae complex is comprised of eight morphologically indistinguishable species and has emerged as a model system for the study of speciation genetics due to the rapid radiation of its member species over the past two million years. Male hybrids between most An. gambiae complex species pairs are sterile, and some genotype combinations in hybrid males cause inviability. We investigated the genetic basis of hybrid male inviability and sterility between An. coluzzii and An. quadriannulatus by measuring segregation distortion and performing a QTL analysis of sterility in a backcross population. Hybrid males were inviable if they inherited the An. coluzzii X chromosome and were homozygous at one or more loci in 18.9 Mb region of chromosome 3. The An. coluzzii X chromosome has a disproportionately large effect on hybrid sterility when introgressed into an An. quadriannulatus genetic background. Additionally, an epistatic interaction between the An. coluzzii X and a 1.12 Mb, pericentric region of the An. quadriannulatus 3L chromosome arm has a statistically significant contribution to the hybrid sterility phenotype. This same epistatic interaction occurs when the An. coluzzii X is introgressed into the genetic background of An. arabiensis, the sister species of An. quadriannulatus, suggesting that this may represent one of the first Dobzhansky-Muller incompatibilities to evolve early in the radiation of the Anopheles gambiae species complex. We describe the additive effects of each sterility QTL, epistatic interactions between them, and genes within QTL with protein functions related to mating behavior, reproduction, spermatogenesis, and microtubule morphogenesis, whose divergence may contribute to post-zygotic reproductive isolation between An. coluzzii and An. quadriannulatus.
Collapse
Affiliation(s)
- Kevin C. Deitz
- Department of Entomology, Texas A&M University, College Station, TX, United States
| | - Willem Takken
- Laboratory of Entomology, Wageningen University and Research, Wageningen, Netherlands
| | - Michel A. Slotman
- Department of Entomology, Texas A&M University, College Station, TX, United States
| |
Collapse
|
12
|
Haas M, Himmelbach A, Mascher M. The contribution of cis- and trans-acting variants to gene regulation in wild and domesticated barley under cold stress and control conditions. JOURNAL OF EXPERIMENTAL BOTANY 2020; 71:2573-2584. [PMID: 31989179 PMCID: PMC7210754 DOI: 10.1093/jxb/eraa036] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Accepted: 01/27/2020] [Indexed: 05/16/2023]
Abstract
Barley, like other crops, has experienced a series of genetic changes that have impacted its architecture and growth habit to suit the needs of humans, termed the domestication syndrome. Domestication also resulted in a concomitant bottleneck that reduced sequence diversity in genes and regulatory regions. Little is known about regulatory changes resulting from domestication in barley. We used RNA sequencing to examine allele-specific expression in hybrids between wild and domesticated barley. Our results show that most genes have conserved regulation. In contrast to studies of allele-specific expression in interspecific hybrids, we find almost a complete absence of trans effects. We also find that cis regulation is largely stable in response to short-term cold stress. Our study has practical implications for crop improvement using wild relatives. Genes regulated in cis are more likely to be expressed in a new genetic background at the same level as in their native background.
Collapse
Affiliation(s)
- Matthew Haas
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, D-06466 Seeland, Germany
- Correspondence: or Present address: University of Minnesota, Department of Agronomy and Plant Genetics, Saint Paul, MN 55108, USA
| | - Axel Himmelbach
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, D-06466 Seeland, Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, D-06466 Seeland, Germany
- German Center for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, D-04103 Leipzig, Germany
- Correspondence: or Present address: University of Minnesota, Department of Agronomy and Plant Genetics, Saint Paul, MN 55108, USA
| |
Collapse
|
13
|
Hovhannisyan H, Saus E, Ksiezopolska E, Hinks Roberts AJ, Louis EJ, Gabaldón T. Integrative Omics Analysis Reveals a Limited Transcriptional Shock After Yeast Interspecies Hybridization. Front Genet 2020; 11:404. [PMID: 32457798 PMCID: PMC7221068 DOI: 10.3389/fgene.2020.00404] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 03/30/2020] [Indexed: 12/30/2022] Open
Abstract
The formation of interspecific hybrids results in the coexistence of two diverged genomes within the same nucleus. It has been hypothesized that negative epistatic interactions and regulatory interferences between the two sub-genomes may elicit a so-called genomic shock involving, among other alterations, broad transcriptional changes. To assess the magnitude of this shock in hybrid yeasts, we investigated the transcriptomic differences between a newly formed Saccharomyces cerevisiae × Saccharomyces uvarum diploid hybrid and its diploid parentals, which diverged ∼20 mya. RNA sequencing (RNA-Seq) based allele-specific expression (ASE) analysis indicated that gene expression changes in the hybrid genome are limited, with only ∼1-2% of genes significantly altering their expression with respect to a non-hybrid context. In comparison, a thermal shock altered six times more genes. Furthermore, differences in the expression between orthologous genes in the two parental species tended to be diminished for the corresponding homeologous genes in the hybrid. Finally, and consistent with the RNA-Seq results, we show a limited impact of hybridization on chromatin accessibility patterns, as assessed with assay for transposase-accessible chromatin using sequencing (ATAC-Seq). Overall, our results suggest a limited genomic shock in a newly formed yeast hybrid, which may explain the high frequency of successful hybridization in these organisms.
Collapse
Affiliation(s)
- Hrant Hovhannisyan
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Health and Life Sciences. Universitat Pompeu Fabra, Barcelona, Spain
| | - Ester Saus
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Health and Life Sciences. Universitat Pompeu Fabra, Barcelona, Spain
| | - Ewa Ksiezopolska
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Health and Life Sciences. Universitat Pompeu Fabra, Barcelona, Spain
| | - Alex J. Hinks Roberts
- Centre for Genetic Architecture of Complex Traits, University of Leicester, Leicester, United Kingdom
| | - Edward J. Louis
- Centre for Genetic Architecture of Complex Traits, University of Leicester, Leicester, United Kingdom
| | - Toni Gabaldón
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Health and Life Sciences. Universitat Pompeu Fabra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
| |
Collapse
|
14
|
Buchberger E, Reis M, Lu TH, Posnien N. Cloudy with a Chance of Insights: Context Dependent Gene Regulation and Implications for Evolutionary Studies. Genes (Basel) 2019; 10:E492. [PMID: 31261769 PMCID: PMC6678813 DOI: 10.3390/genes10070492] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 06/20/2019] [Accepted: 06/26/2019] [Indexed: 12/20/2022] Open
Abstract
Research in various fields of evolutionary biology has shown that divergence in gene expression is a key driver for phenotypic evolution. An exceptional contribution of cis-regulatory divergence has been found to contribute to morphological diversification. In the light of these findings, the analysis of genome-wide expression data has become one of the central tools to link genotype and phenotype information on a more mechanistic level. However, in many studies, especially if general conclusions are drawn from such data, a key feature of gene regulation is often neglected. With our article, we want to raise awareness that gene regulation and thus gene expression is highly context dependent. Genes show tissue- and stage-specific expression. We argue that the regulatory context must be considered in comparative expression studies.
Collapse
Affiliation(s)
- Elisa Buchberger
- University Göttingen, Göttingen Center for Molecular Biosciences (GZMB), Dpt. of Developmental Biology, Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.
| | - Micael Reis
- University Göttingen, Göttingen Center for Molecular Biosciences (GZMB), Dpt. of Developmental Biology, Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.
| | - Ting-Hsuan Lu
- University Göttingen, Göttingen Center for Molecular Biosciences (GZMB), Dpt. of Developmental Biology, Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.
- International Max Planck Research School for Genome Science, Am Fassberg 11, 37077 Göttingen, Germany.
| | - Nico Posnien
- University Göttingen, Göttingen Center for Molecular Biosciences (GZMB), Dpt. of Developmental Biology, Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.
| |
Collapse
|
15
|
Graze RM, Tzeng RY, Howard TS, Arbeitman MN. Perturbation of IIS/TOR signaling alters the landscape of sex-differential gene expression in Drosophila. BMC Genomics 2018; 19:893. [PMID: 30526477 PMCID: PMC6288939 DOI: 10.1186/s12864-018-5308-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 11/23/2018] [Indexed: 12/15/2022] Open
Abstract
Background The core functions of the insulin/insulin-like signaling and target of rapamycin (IIS/TOR) pathway are nutrient sensing, energy homeostasis, growth, and regulation of stress responses. This pathway is also known to interact directly and indirectly with the sex determination regulatory hierarchy. The IIS/TOR pathway plays a role in directing sexually dimorphic traits, including dimorphism of growth, metabolism, stress and behavior. Previous studies of sexually dimorphic gene expression in the adult head, which includes both nervous system and endocrine tissues, have revealed variation in sex-differential expression, depending in part on genotype and environment. To understand the degree to which the environmentally responsive insulin signaling pathway contributes to sexual dimorphism of gene expression, we examined the effect of perturbation of the pathway on gene expression in male and female Drosophila heads. Results Our data reveal a large effect of insulin signaling on gene expression, with greater than 50% of genes examined changing expression. Males and females have a shared gene expression response to knock-down of InR function, with significant enrichment for pathways involved in metabolism. Perturbation of insulin signaling has a greater impact on gene expression in males, with more genes changing expression and with gene expression differences of larger magnitude. Primarily as a consequence of the response in males, we find that reduced insulin signaling results in a striking increase in sex-differential expression. This includes sex-differences in expression of immune, defense and stress response genes, genes involved in modulating reproductive behavior, genes linking insulin signaling and ageing, and in the insulin signaling pathway itself. Conclusions Our results demonstrate that perturbation of insulin signaling results in thousands of genes displaying sex differences in expression that are not differentially expressed in control conditions. Thus, insulin signaling may play a role in variability of somatic, sex-differential expression. The finding that perturbation of the IIS/TOR pathway results in an altered landscape of sex-differential expression suggests a role of insulin signaling in the physiological underpinnings of trade-offs, sexual conflict and sex differences in expression variability. Electronic supplementary material The online version of this article (10.1186/s12864-018-5308-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rita M Graze
- Department of Biological Sciences, Auburn University, 101 Rouse Life Sciences building, Auburn, AL, 36849-5407, USA.
| | - Ruei-Ying Tzeng
- Biomedical Sciences Department, Florida State University, College of Medicine, 1115 West Call Street, Tallahassee, FL, 32306, USA
| | - Tiffany S Howard
- Department of Biological Sciences, Auburn University, 101 Rouse Life Sciences building, Auburn, AL, 36849-5407, USA
| | - Michelle N Arbeitman
- Biomedical Sciences Department, Florida State University, College of Medicine, 1115 West Call Street, Tallahassee, FL, 32306, USA.
| |
Collapse
|
16
|
Liu Z, Dong X, Li Y. A Genome-Wide Study of Allele-Specific Expression in Colorectal Cancer. Front Genet 2018; 9:570. [PMID: 30538721 PMCID: PMC6277598 DOI: 10.3389/fgene.2018.00570] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 11/06/2018] [Indexed: 12/30/2022] Open
Abstract
Accumulating evidence from small-scale studies has suggested that allele-specific expression (ASE) plays an important role in tumor initiation and progression. However, little is known about genome-wide ASE in tumors. In this study, we conducted a comprehensive analysis of ASE in individuals with colorectal cancer (CRC) on a genome-wide scale. We identified 5.4 thousand genome-wide ASEs of single nucleotide variations (SNVs) from tumor and normal tissues of 59 individuals with CRC. We observed an increased ASE level in tumor samples and the ASEs enriched as hotspots on the genome. Around 63% of the genes located there were previously reported to contain complex regulatory elements, e.g., human leukocyte antigen (HLA), or were implicated in tumor progression. Focussing on the allelic expression of somatic mutations, we found that 37.5% of them exhibited ASE, and genes harboring such somatic mutations, were enriched in important pathways implicated in cancers. In addition, by comparing the expected and observed ASE events in tumor samples, we identified 50 tumor specific ASEs which possibly contributed to the somatic events in the regulatory regions of the genes and significantly enriched known cancer driver genes. By analyzing CRC ASEs from several perspectives, we provided a systematic understanding of how ASE is implicated in both tumor and normal tissues and will be of critical value in guiding ASE studies in cancer.
Collapse
Affiliation(s)
- Zhi Liu
- Department of Epidemiology and Biostatistics, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Xiao Dong
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, United States
| | - Yixue Li
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.,Shanghai Center for Bioinformation Technology, Shanghai Industrial Technology Institute, Shanghai, China.,Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China
| |
Collapse
|
17
|
A Robust Methodology for Assessing Differential Homeolog Contributions to the Transcriptomes of Allopolyploids. Genetics 2018; 210:883-894. [PMID: 30213855 DOI: 10.1534/genetics.118.301564] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 09/07/2018] [Indexed: 12/18/2022] Open
Abstract
Polyploidy has played a pivotal and recurring role in angiosperm evolution. Allotetraploids arise from hybridization between species and possess duplicated gene copies (homeologs) that serve redundant roles immediately after polyploidization. Although polyploidization is a major contributor to plant evolution, it remains poorly understood. We describe an analytical approach for assessing homeolog-specific expression that begins with de novo assembly of parental transcriptomes and effectively (i) reduces redundancy in de novo assemblies, (ii) identifies putative orthologs, (iii) isolates common regions between orthologs, and (iv) assesses homeolog-specific expression using a robust Bayesian Poisson-Gamma model to account for sequence bias when mapping polyploid reads back to parental references. Using this novel methodology, we examine differential homeolog contributions to the transcriptome in the recently formed allopolyploids Tragopogon mirus and T. miscellus (Compositae). Notably, we assess a larger Tragopogon gene set than previous studies of this system. Using carefully identified orthologous regions and filtering biased orthologs, we find in both allopolyploids largely balanced expression with no strong parental bias. These new methods can be used to examine homeolog expression in any tetrapolyploid system without requiring a reference genome.
Collapse
|
18
|
Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data. G3-GENES GENOMES GENETICS 2018; 8:2923-2940. [PMID: 30021829 PMCID: PMC6118309 DOI: 10.1534/g3.118.200373] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the construction of transcripts that do not exist, and the failure to identify true transcripts. An alternative approach is to catalog the events that make up isoforms (splice junctions and exons). We present here the Event Analysis (EA) approach, where we project transcripts onto the genome and identify overlapping/unique regions and junctions. In addition, all possible logical junctions are assembled into a catalog. Transcripts are filtered before quantitation based on simple measures: the proportion of the events detected, and the coverage. We find that mapping to a junction catalog is more efficient at detecting novel junctions than mapping in a splice aware manner. We identify 99.8% of true transcripts while iReckon identifies 82% of the true transcripts and creates more transcripts not included in the simulation than were initially used in the simulation. Using PacBio Iso-seq data from a mouse neural progenitor cell model, EA detects 60% of the novel junctions that are combinations of existing exons while only 43% are detected by STAR. EA further detects ∼5,000 annotated junctions missed by STAR. Filtering transcripts based on the proportion of the transcript detected and the number of reads on average supporting that transcript captures 95% of the PacBio transcriptome. Filtering the reference transcriptome before quantitation, results in is a more stable estimate of isoform abundance, with improved correlation between replicates. This was particularly evident when EA is applied to an RNA-seq study of type 1 diabetes (T1D), where the coefficient of variation among subjects (n = 81) in the transcript abundance estimates was substantially reduced compared to the estimation using the full reference. EA focuses on individual transcriptional events. These events can be quantitate and analyzed directly or used to identify the probable set of expressed transcripts. Simple rules based on detected events and coverage used in filtering result in a dramatic improvement in isoform estimation without the use of ancillary data (e.g., ChIP, long reads) that may not be available for many studies.
Collapse
|
19
|
Wang M, Uebbing S, Pawitan Y, Scofield DG. RPASE: Individual-based allele-specific expression detection without prior knowledge of haplotype phase. Mol Ecol Resour 2018; 18:1247-1262. [PMID: 29858523 DOI: 10.1111/1755-0998.12909] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 05/09/2018] [Accepted: 05/21/2018] [Indexed: 01/04/2023]
Abstract
Variation in gene expression is believed to make a significant contribution to phenotypic diversity and divergence. The analysis of allele-specific expression (ASE) can reveal important insights into gene expression regulation. We developed a novel method called RPASE (Read-backed Phasing-based ASE detection) to test for genes that show ASE. With mapped RNA-seq data from a single individual and a list of SNPs from the same individual as the only input, RPASE is capable of aggregating information across multiple dependent SNPs and producing individual-based gene-level tests for ASE. RPASE performs well in simulations and comparisons. We applied RPASE to multiple bird species and found a potentially rich landscape of ASE.
Collapse
Affiliation(s)
- Mi Wang
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Severin Uebbing
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Douglas G Scofield
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| |
Collapse
|
20
|
Direct Testing for Allele-Specific Expression Differences Between Conditions. G3-GENES GENOMES GENETICS 2018; 8:447-460. [PMID: 29167272 PMCID: PMC5919738 DOI: 10.1534/g3.117.300139] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Allelic imbalance (AI) indicates the presence of functional variation in cis regulatory regions. Detecting cis regulatory differences using AI is widespread, yet there is no formal statistical methodology that tests whether AI differs between conditions. Here, we present a novel model and formally test differences in AI across conditions using Bayesian credible intervals. The approach tests AI by environment (G×E) interactions, and can be used to test AI between environments, genotypes, sex, and any other condition. We incorporate bias into the modeling process. Bias is allowed to vary between conditions, making the formulation of the model general. As gene expression affects power for detection of AI, and, as expression may vary between conditions, the model explicitly takes coverage into account. The proposed model has low type I and II error under several scenarios, and is robust to large differences in coverage between conditions. We reanalyze RNA-seq data from a Drosophila melanogaster population panel, with F1 genotypes, to compare levels of AI between mated and virgin female flies, and we show that AI × genotype interactions can also be tested. To demonstrate the use of the model to test genetic differences and interactions, a formal test between two F1s was performed, showing the expected 20% difference in AI. The proposed model allows a formal test of G×E and G×G, and reaffirms a previous finding that cis regulation is robust between environments.
Collapse
|
21
|
Tu YH, Cooper AJ, Teng B, Chang RB, Artiga DJ, Turner HN, Mulhall EM, Ye W, Smith AD, Liman ER. An evolutionarily conserved gene family encodes proton-selective ion channels. Science 2018; 359:1047-1050. [PMID: 29371428 DOI: 10.1126/science.aao3264] [Citation(s) in RCA: 165] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 01/08/2018] [Indexed: 12/20/2022]
Abstract
Ion channels form the basis for cellular electrical signaling. Despite the scores of genetically identified ion channels selective for other monatomic ions, only one type of proton-selective ion channel has been found in eukaryotic cells. By comparative transcriptome analysis of mouse taste receptor cells, we identified Otopetrin1 (OTOP1), a protein required for development of gravity-sensing otoconia in the vestibular system, as forming a proton-selective ion channel. We found that murine OTOP1 is enriched in acid-detecting taste receptor cells and is required for their zinc-sensitive proton conductance. Two related murine genes, Otop2 and Otop3, and a Drosophila ortholog also encode proton channels. Evolutionary conservation of the gene family and its widespread tissue distribution suggest a broad role for proton channels in physiology and pathophysiology.
Collapse
Affiliation(s)
- Yu-Hsiang Tu
- Department of Biological Sciences, Section of Neurobiology, University of Southern California, Los Angeles, CA 90089, USA
| | - Alexander J Cooper
- Department of Biological Sciences, Section of Neurobiology, University of Southern California, Los Angeles, CA 90089, USA
| | - Bochuan Teng
- Department of Biological Sciences, Section of Neurobiology, University of Southern California, Los Angeles, CA 90089, USA
| | - Rui B Chang
- Department of Biological Sciences, Section of Neurobiology, University of Southern California, Los Angeles, CA 90089, USA
| | - Daniel J Artiga
- Department of Biological Sciences, Section of Neurobiology, University of Southern California, Los Angeles, CA 90089, USA
| | - Heather N Turner
- Department of Biological Sciences, Section of Neurobiology, University of Southern California, Los Angeles, CA 90089, USA
| | - Eric M Mulhall
- Department of Biological Sciences, Section of Neurobiology, University of Southern California, Los Angeles, CA 90089, USA
| | - Wenlei Ye
- Department of Biological Sciences, Section of Neurobiology, University of Southern California, Los Angeles, CA 90089, USA.
| | - Andrew D Smith
- Department of Biological Sciences, Section of Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Emily R Liman
- Department of Biological Sciences, Section of Neurobiology, University of Southern California, Los Angeles, CA 90089, USA. .,Bridge Institute, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
22
|
Newman JRB, Conesa A, Mika M, New FN, Onengut-Gumuscu S, Atkinson MA, Rich SS, McIntyre LM, Concannon P. Disease-specific biases in alternative splicing and tissue-specific dysregulation revealed by multitissue profiling of lymphocyte gene expression in type 1 diabetes. Genome Res 2017; 27:1807-1815. [PMID: 29025893 PMCID: PMC5668939 DOI: 10.1101/gr.217984.116] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 09/13/2017] [Indexed: 12/22/2022]
Abstract
Genome-wide association studies (GWAS) have identified multiple, shared allelic associations with many autoimmune diseases. However, the pathogenic contributions of variants residing in risk loci remain unresolved. The location of the majority of shared disease-associated variants in noncoding regions suggests they contribute to risk of autoimmunity through effects on gene expression in the immune system. In the current study, we test this hypothesis by applying RNA sequencing to CD4+, CD8+, and CD19+ lymphocyte populations isolated from 81 subjects with type 1 diabetes (T1D). We characterize and compare the expression patterns across these cell types for three gene sets: all genes, the set of genes implicated in autoimmune disease risk by GWAS, and the subset of these genes specifically implicated in T1D. We performed RNA sequencing and aligned the reads to both the human reference genome and a catalog of all possible splicing events developed from the genome, thereby providing a comprehensive evaluation of the roles of gene expression and alternative splicing (AS) in autoimmunity. Autoimmune candidate genes displayed greater expression specificity in the three lymphocyte populations relative to other genes, with significantly increased levels of splicing events, particularly those predicted to have substantial effects on protein isoform structure and function (e.g., intron retention, exon skipping). The majority of single-nucleotide polymorphisms within T1D-associated loci were also associated with one or more cis-expression quantitative trait loci (cis-eQTLs) and/or splicing eQTLs. Our findings highlight a substantial, and previously underrecognized, role for AS in the pathogenesis of autoimmune disorders and particularly for T1D.
Collapse
Affiliation(s)
- Jeremy R B Newman
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, Florida 32610, USA
| | - Ana Conesa
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, Florida 32610, USA
- Genetics Institute, University of Florida, Gainesville, Florida 32610, USA
| | - Matthew Mika
- Center for Public Health Genomics and Department of Public Health Sciences, University of Virginia, Charlottesville, Virginia 22908, USA
| | - Felicia N New
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, Florida 32610, USA
| | - Suna Onengut-Gumuscu
- Center for Public Health Genomics and Department of Public Health Sciences, University of Virginia, Charlottesville, Virginia 22908, USA
| | - Mark A Atkinson
- Diabetes Institute, University of Florida, Gainesville, Florida 32610, USA
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida, Gainesville, Florida 32610, USA
| | - Stephen S Rich
- Center for Public Health Genomics and Department of Public Health Sciences, University of Virginia, Charlottesville, Virginia 22908, USA
| | - Lauren M McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, Florida 32610, USA
- Genetics Institute, University of Florida, Gainesville, Florida 32610, USA
| | - Patrick Concannon
- Genetics Institute, University of Florida, Gainesville, Florida 32610, USA
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida, Gainesville, Florida 32610, USA
| |
Collapse
|
23
|
Wang M, Uebbing S, Ellegren H. Bayesian Inference of Allele-Specific Gene Expression Indicates Abundant Cis-Regulatory Variation in Natural Flycatcher Populations. Genome Biol Evol 2017; 9:1266-1279. [PMID: 28453623 PMCID: PMC5434935 DOI: 10.1093/gbe/evx080] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2017] [Indexed: 12/13/2022] Open
Abstract
Polymorphism in cis-regulatory sequences can lead to different levels of expression for the two alleles of a gene, providing a starting point for the evolution of gene expression. Little is known about the genome-wide abundance of genetic variation in gene regulation in natural populations but analysis of allele-specific expression (ASE) provides a means for investigating such variation. We performed RNA-seq of multiple tissues from population samples of two closely related flycatcher species and developed a Bayesian algorithm that maximizes data usage by borrowing information from the whole data set and combines several SNPs per transcript to detect ASE. Of 2,576 transcripts analyzed in collared flycatcher, ASE was detected in 185 (7.2%) and a similar frequency was seen in the pied flycatcher. Transcripts with statistically significant ASE commonly showed the major allele in >90% of the reads, reflecting that power was highest when expression was heavily biased toward one of the alleles. This would suggest that the observed frequencies of ASE likely are underestimates. The proportion of ASE transcripts varied among tissues, being lowest in testis and highest in muscle. Individuals often showed ASE of particular transcripts in more than one tissue (73.4%), consistent with a genetic basis for regulation of gene expression. The results suggest that genetic variation in regulatory sequences commonly affects gene expression in natural populations and that it provides a seedbed for phenotypic evolution via divergence in gene expression.
Collapse
Affiliation(s)
- Mi Wang
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Sweden
| | - Severin Uebbing
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Sweden
| | - Hans Ellegren
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Sweden
| |
Collapse
|
24
|
León-Novelo L, Fuentes C, Emerson S. Marginal likelihood estimation of negative binomial parameters with applications to RNA-seq data. Biostatistics 2017; 18:637-650. [PMID: 28369228 DOI: 10.1093/biostatistics/kxx006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 02/01/2017] [Indexed: 11/12/2022] Open
Abstract
RNA-Seq data characteristically exhibits large variances, which need to be appropriately accounted for in any proposed model. We first explore the effects of this variability on the maximum likelihood estimator (MLE) of the dispersion parameter of the negative binomial distribution, and propose instead to use an estimator obtained via maximization of the marginal likelihood in a conjugate Bayesian framework. We show, via simulation studies, that the marginal MLE can better control this variation and produce a more stable and reliable estimator. We then formulate a conjugate Bayesian hierarchical model, and use this new estimator to propose a Bayesian hypothesis test to detect differentially expressed genes in RNA-Seq data. We use numerical studies to show that our much simpler approach is competitive with other negative binomial based procedures, and we use a real data set to illustrate the implementation and flexibility of the procedure.
Collapse
Affiliation(s)
- Luis León-Novelo
- Department of Biostatistics, University of Texas Health Science Center at Houston - School of Public Health, Houston, TX 77030, USA
| | - Claudio Fuentes
- Department of of Statistics, Oregon State University, Corvallis, OR 97331, USA
| | - Sarah Emerson
- Department of of Statistics, Oregon State University, Corvallis, OR 97331, USA
| |
Collapse
|
25
|
Abstract
Over recent years, multiple groups have shown that a large number of structural variants, repeats, or problems with the underlying genome assembly have dramatic effects on the mapping, calling, and overall reliability of single nucleotide polymorphism calls. This project endeavored to develop an easy-to-use track for looking at structural variant and repeat regions. This track, DangerTrack, can be displayed alongside the existing Genome Reference Consortium assembly tracks to warn clinicians and biologists when variants of interest may be incorrectly called, of dubious quality, or on an insertion or copy number expansion. While mapping and variant calling can be automated, it is our opinion that when these regions are of interest to a particular clinical or research group, they warrant a careful examination, potentially involving localized reassembly. DangerTrack is available at
https://github.com/DCGenomics/DangerTrack.
Collapse
Affiliation(s)
- Igor Dolgalev
- New York University School of Medicine, New York, NY, 10016, USA
| | - Fritz Sedlazeck
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21202, USA
| | - Ben Busby
- National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| |
Collapse
|
26
|
Neurons That Underlie Drosophila melanogaster Reproductive Behaviors: Detection of a Large Male-Bias in Gene Expression in fruitless-Expressing Neurons. G3-GENES GENOMES GENETICS 2016; 6:2455-65. [PMID: 27247289 PMCID: PMC4978899 DOI: 10.1534/g3.115.019265] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Male and female reproductive behaviors in Drosophila melanogaster are vastly different, but neurons that express sex-specifically spliced fruitless transcripts (fru P1) underlie these behaviors in both sexes. How this set of neurons can generate such different behaviors between the two sexes is an unresolved question. A particular challenge is that fru P1-expressing neurons comprise only 2-5% of the adult nervous system, and so studies of adult head tissue or whole brain may not reveal crucial differences. Translating Ribosome Affinity Purification (TRAP) identifies the actively translated pool of mRNAs from fru P1-expressing neurons, allowing a sensitive, cell-type-specific assay. We find four times more male-biased than female-biased genes in TRAP mRNAs from fru P1-expressing neurons. This suggests a potential mechanism to generate dimorphism in behavior. The male-biased genes may direct male behaviors by establishing cell fate in a similar context of gene expression observed in females. These results suggest a possible global mechanism for how distinct behaviors can arise from a shared set of neurons.
Collapse
|
27
|
Sex Differences in Drosophila Somatic Gene Expression: Variation and Regulation by doublesex. G3-GENES GENOMES GENETICS 2016; 6:1799-808. [PMID: 27172187 PMCID: PMC4938635 DOI: 10.1534/g3.116.027961] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Sex differences in gene expression have been widely studied in Drosophila melanogaster. Sex differences vary across strains, but many molecular studies focus on only a single strain, or on genes that show sexually dimorphic expression in many strains. How extensive variability is and whether this variability occurs among genes regulated by sex determination hierarchy terminal transcription factors is unknown. To address these questions, we examine differences in sexually dimorphic gene expression between two strains in Drosophila adult head tissues. We also examine gene expression in doublesex (dsx) mutant strains to determine which sex-differentially expressed genes are regulated by DSX, and the mode by which DSX regulates expression. We find substantial variation in sex-differential expression. The sets of genes with sexually dimorphic expression in each strain show little overlap. The prevalence of different DSX regulatory modes also varies between the two strains. Neither the patterns of DSX DNA occupancy, nor mode of DSX regulation explain why some genes show consistent sex-differential expression across strains. We find that the genes identified as regulated by DSX in this study are enriched with known sites of DSX DNA occupancy. Finally, we find that sex-differentially expressed genes and genes regulated by DSX are highly enriched on the fourth chromosome. These results provide insights into a more complete pool of potential DSX targets, as well as revealing the molecular flexibility of DSX regulation.
Collapse
|
28
|
Torres-Oliva M, Almudi I, McGregor AP, Posnien N. A robust (re-)annotation approach to generate unbiased mapping references for RNA-seq-based analyses of differential expression across closely related species. BMC Genomics 2016; 17:392. [PMID: 27220689 PMCID: PMC4877740 DOI: 10.1186/s12864-016-2646-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Accepted: 04/22/2016] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND RNA-seq based on short reads generated by next generation sequencing technologies has become the main approach to study differential gene expression. Until now, the main applications of this technique have been to study the variation of gene expression in a whole organism, tissue or cell type under different conditions or at different developmental stages. However, RNA-seq also has a great potential to be used in evolutionary studies to investigate gene expression divergence in closely related species. RESULTS We show that the published genomes and annotations of the three closely related Drosophila species D. melanogaster, D. simulans and D. mauritiana have limitations for inter-specific gene expression studies. This is due to missing gene models in at least one of the genome annotations, unclear orthology assignments and significant gene length differences in the different species. A comprehensive evaluation of four statistical frameworks (DESeq2, DESeq2 with length correction, RPKM-limma and RPKM-voom-limma) shows that none of these methods sufficiently accounts for inter-specific gene length differences, which inevitably results in false positive candidate genes. We propose that published reference genomes should be re-annotated before using them as references for RNA-seq experiments to include as many genes as possible and to account for a potential length bias. We present a straight-forward reciprocal re-annotation pipeline that allows to reliably compare the expression for nearly all genes annotated in D. melanogaster. CONCLUSIONS We conclude that our reciprocal re-annotation of previously published genomes facilitates the analysis of significantly more genes in an inter-specific differential gene expression study. We propose that the established pipeline can easily be applied to re-annotate other genomes of closely related animals and plants to improve comparative expression analyses.
Collapse
Affiliation(s)
- Montserrat Torres-Oliva
- />Georg-August-Universität Göttingen, Johann-Friedrich-Blumenbach-Institut für Zoologie und Anthropologie, Abteilung für Entwicklungsbiologie, GZMB Ernst-Caspari-Haus, Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany
- />Göttingen Center for Molecular Biosciences (GZMB), GZMB Ernst-Caspari-Haus, Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany
| | - Isabel Almudi
- />Department of Biological and Medical Sciences, Oxford Brookes University, Gipsy Lane, Oxford, OX3 0BP UK
- />Andalusian Centre of Developmental Biology, carretera de Utrera, km.1, 41013 Seville, Spain
| | - Alistair P. McGregor
- />Department of Biological and Medical Sciences, Oxford Brookes University, Gipsy Lane, Oxford, OX3 0BP UK
| | - Nico Posnien
- />Georg-August-Universität Göttingen, Johann-Friedrich-Blumenbach-Institut für Zoologie und Anthropologie, Abteilung für Entwicklungsbiologie, GZMB Ernst-Caspari-Haus, Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany
- />Göttingen Center for Molecular Biosciences (GZMB), GZMB Ernst-Caspari-Haus, Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany
| |
Collapse
|
29
|
Buffering of Genetic Regulatory Networks in Drosophila melanogaster. Genetics 2016; 203:1177-90. [PMID: 27194752 DOI: 10.1534/genetics.116.188797] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 05/17/2016] [Indexed: 01/01/2023] Open
Abstract
Regulatory variation in gene expression can be described by cis- and trans-genetic components. Here we used RNA-seq data from a population panel of Drosophila melanogaster test crosses to compare allelic imbalance (AI) in female head tissue between mated and virgin flies, an environmental change known to affect transcription. Indeed, 3048 exons (1610 genes) are differentially expressed in this study. A Bayesian model for AI, with an intersection test, controls type I error. There are ∼200 genes with AI exclusively in mated or virgin flies, indicating an environmental component of expression regulation. On average 34% of genes within a cross and 54% of all genes show evidence for genetic regulation of transcription. Nearly all differentially regulated genes are affected in cis, with an average of 63% of expression variation explained by the cis-effects. Trans-effects explain 8% of the variance in AI on average and the interaction between cis and trans explains an average of 11% of the total variance in AI. In both environments cis- and trans-effects are compensatory in their overall effect, with a negative association between cis- and trans-effects in 85% of the exons examined. We hypothesize that the gene expression level perturbed by cis-regulatory mutations is compensated through trans-regulatory mechanisms, e.g., trans and cis by trans-factors buffering cis-mutations. In addition, when AI is detected in both environments, cis-mated, cis-virgin, and trans-mated-trans-virgin estimates are highly concordant with 99% of all exons positively correlated with a median correlation of 0.83 for cis and 0.95 for trans We conclude that the gene regulatory networks (GRNs) are robust and that trans-buffering explains robustness.
Collapse
|
30
|
RNA-Seq alignment to individualized genomes improves transcript abundance estimates in multiparent populations. Genetics 2015; 198:59-73. [PMID: 25236449 PMCID: PMC4174954 DOI: 10.1534/genetics.114.165886] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations.
Collapse
|
31
|
Kurmangaliyev YZ, Favorov AV, Osman NM, Lehmann KV, Campo D, Salomon MP, Tower J, Gelfand MS, Nuzhdin SV. Natural variation of gene models in Drosophila melanogaster. BMC Genomics 2015; 16:198. [PMID: 25888292 PMCID: PMC4373058 DOI: 10.1186/s12864-015-1415-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 02/28/2015] [Indexed: 01/10/2025] Open
Abstract
Background Variation within splicing regulatory sequences often leads to differences in gene models among individuals within a species. Two alleles of the same gene may express transcripts with different exon/intron structures and consequently produce functionally different proteins. Matching genomic and transcriptomic data allows us to identify putative regulatory variants associated with changes in splicing patterns. Results Here we analyzed natural variation of splicing patterns in the transcriptomes of 81 natural strains of Drosophila melanogaster with known genotypes. We identified dozens of genotype-specific splicing patterns associated with putative cis-splicing quantitative trait loci (sQTL). The majority of changes can be explained by mutations in splice sites. Allelic-imbalance in splicing patterns confirmed that the majority are regulated mainly by cis-genetic effects. Remarkably, allele-specific splicing changes often lead to qualitative changes in gene models, yielding many isoforms not previously annotated. The observed alterations are typically outside protein-coding regions or affect only very short protein segments. Conclusions Overall, the sets of gene models appear to be flexible within D. melanogaster populations. The observed variation in splicing patterns are predicted to have limited effects on the encoded protein sequences. To our knowledge, this is the first sQTL mapping study in Drosophila. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1415-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yerbol Z Kurmangaliyev
- University of Southern California, Los Angeles, CA, USA. .,Institute for Information Transmission Problems (Kharkevich Institute), Moscow, Russia.
| | - Alexander V Favorov
- Johns Hopkins University School of Medicine, Baltimore, MD, USA. .,Vavilov Institute of General Genetics, Moscow, Russia. .,Research Institute of Genetics and Selection of Industrial Microorganisms, Moscow, Russia.
| | - Noha M Osman
- University of Southern California, Los Angeles, CA, USA. .,National Research Center, Dokki, Giza, Egypt.
| | - Kjong-Van Lehmann
- Memorial Sloan Kettering Cancer Center, Zuckerman Research Center, New York, NY, USA.
| | - Daniel Campo
- University of Southern California, Los Angeles, CA, USA.
| | | | - John Tower
- University of Southern California, Los Angeles, CA, USA.
| | - Mikhail S Gelfand
- Institute for Information Transmission Problems (Kharkevich Institute), Moscow, Russia. .,Lomonosov Moscow State University, Moscow, Russia.
| | - Sergey V Nuzhdin
- University of Southern California, Los Angeles, CA, USA. .,Saint Petersburg Polytechnical University, St Petersburg, Russia.
| |
Collapse
|
32
|
Graze RM, McIntyre LM, Morse AM, Boyd BM, Nuzhdin SV, Wayne ML. What the X has to do with it: differences in regulatory variability between the sexes in Drosophila simulans. Genome Biol Evol 2015; 6:818-29. [PMID: 24696400 PMCID: PMC4007535 DOI: 10.1093/gbe/evu060] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The mechanistic basis of regulatory variation and the prevailing evolutionary forces shaping that variation are known to differ between sexes and between chromosomes. Regulatory variation of gene expression can be due to functional changes within a gene itself (cis) or in other genes elsewhere in the genome (trans). The evolutionary properties of cis mutations are expected to differ from mutations affecting gene expression in trans. We analyze allele-specific expression across a set of X substitution lines in intact adult Drosophila simulans to evaluate whether regulatory variation differs for cis and trans, for males and females, and for X-linked and autosomal genes. Regulatory variation is common (56% of genes), and patterns of variation within D. simulans are consistent with previous observations in Drosophila that there is more cis than trans variation within species (47% vs. 25%, respectively). The relationship between sex-bias and sex-limited variation is remarkably consistent across sexes. However, there are differences between cis and trans effects: cis variants show evidence of purifying selection in the sex toward which expression is biased, while trans variants do not. For female-biased genes, the X is depleted for trans variation in a manner consistent with a female-dominated selection regime on the X. Surprisingly, there is no evidence for depletion of trans variation for male-biased genes on X. This is evidence for regulatory feminization of the X, trans-acting factors controlling male-biased genes are more likely to be found on the autosomes than those controlling female-biased genes.
Collapse
Affiliation(s)
- Rita M. Graze
- Department of Molecular Genetics and Microbiology, University of Florida
- Department of Biological Sciences, Auburn University
- *Corresponding author: E-mail:
| | - Lauren M. McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida
- Department of Statistics, University of Florida
| | - Alison M. Morse
- Department of Molecular Genetics and Microbiology, University of Florida
| | - Bret M. Boyd
- Florida Museum of Natural History, University of Florida
| | - Sergey V. Nuzhdin
- Section of Molecular and Computational Biology, Department of Biological Sciences, University of Southern California
| | | |
Collapse
|
33
|
Soderlund CA, Nelson WM, Goff SA. Allele Workbench: transcriptome pipeline and interactive graphics for allele-specific expression. PLoS One 2014; 9:e115740. [PMID: 25541944 PMCID: PMC4277417 DOI: 10.1371/journal.pone.0115740] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 10/19/2014] [Indexed: 12/30/2022] Open
Abstract
Sequencing the transcriptome can answer various questions such as determining the transcripts expressed in a given species for a specific tissue or condition, evaluating differential expression, discovering variants, and evaluating allele-specific expression. Differential expression evaluates the expression differences between different strains, tissues, and conditions. Allele-specific expression evaluates expression differences between parental alleles. Both differential expression and allele-specific expression have been studied for heterosis (hybrid vigor), where the hybrid has improved performance over the parents for one or more traits. The Allele Workbench software was developed for a heterosis study that evaluated allele-specific expression for a mouse F1 hybrid using libraries from multiple tissues with biological replicates. This software has been made into a distributable package, which includes a pipeline, a Java interface to build the database, and a Java interface for query and display of the results. The required input is a reference genome, annotation file, and one or more RNA-Seq libraries with optional replicates. It evaluates allelic imbalance at the SNP and transcript level and flags transcripts with significant opposite directional allele-specific expression. The Java interface allows the user to view data from libraries, replicates, genes, transcripts, exons, and variants, including queries on allele imbalance for selected libraries. To determine the impact of allele-specific SNPs on protein folding, variants are annotated with their effect (e.g., missense), and the parental protein sequences may be exported for protein folding analysis. The Allele Workbench processing results in transcript files and read counts that can be used as input to the previously published Transcriptome Computational Workbench, which has a new algorithm for determining a trimmed set of gene ontology terms. The software with demo files is available from https://code.google.com/p/allele-workbench. Additionally, all software is ready for immediate use from an Atmosphere Virtual Machine Image available from the iPlant Collaborative (www.iplantcollaborative.org).
Collapse
Affiliation(s)
- Carol A. Soderlund
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
- * E-mail:
| | - William M. Nelson
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Stephen A. Goff
- iPlant Collaborative, University of Arizona, Tucson, Arizona, United States of America
| |
Collapse
|
34
|
León-Novelo LG, McIntyre LM, Fear JM, Graze RM. A flexible Bayesian method for detecting allelic imbalance in RNA-seq data. BMC Genomics 2014; 15:920. [PMID: 25339465 PMCID: PMC4230747 DOI: 10.1186/1471-2164-15-920] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 10/09/2014] [Indexed: 01/01/2023] Open
Abstract
Background One method of identifying cis regulatory differences is to analyze allele-specific expression (ASE) and identify cases of allelic imbalance (AI). RNA-seq is the most common way to measure ASE and a binomial test is often applied to determine statistical significance of AI. This implicitly assumes that there is no bias in estimation of AI. However, bias has been found to result from multiple factors including: genome ambiguity, reference quality, the mapping algorithm, and biases in the sequencing process. Two alternative approaches have been developed to handle bias: adjusting for bias using a statistical model and filtering regions of the genome suspected of harboring bias. Existing statistical models which account for bias rely on information from DNA controls, which can be cost prohibitive for large intraspecific studies. In contrast, data filtering is inexpensive and straightforward, but necessarily involves sacrificing a portion of the data. Results Here we propose a flexible Bayesian model for analysis of AI, which accounts for bias and can be implemented without DNA controls. In lieu of DNA controls, this Poisson-Gamma (PG) model uses an estimate of bias from simulations. The proposed model always has a lower type I error rate compared to the binomial test. Consistent with prior studies, bias dramatically affects the type I error rate. All of the tested models are sensitive to misspecification of bias. The closer the estimate of bias is to the true underlying bias, the lower the type I error rate. Correct estimates of bias result in a level alpha test. Conclusions To improve the assessment of AI, some forms of systematic error (e.g., map bias) can be identified using simulation. The resulting estimates of bias can be used to correct for bias in the PG model, without data filtering. Other sources of bias (e.g., unidentified variant calls) can be easily captured by DNA controls, but are missed by common filtering approaches. Consequently, as variant identification improves, the need for DNA controls will be reduced. Filtering does not significantly improve performance and is not recommended, as information is sacrificed without a measurable gain. The PG model developed here performs well when bias is known, or slightly misspecified. The model is flexible and can accommodate differences in experimental design and bias estimation. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-920) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | - Rita M Graze
- Department of Biological Sciences, Auburn University, 101 Rouse Life Science Building, 36849 Auburn, AL, USA.
| |
Collapse
|
35
|
Liu Z, Yang J, Xu H, Li C, Wang Z, Li Y, Dong X, Li Y. Comparing computational methods for identification of allele-specific expression based on next generation sequencing data. Genet Epidemiol 2014; 38:591-8. [PMID: 25183311 DOI: 10.1002/gepi.21846] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 05/15/2014] [Accepted: 06/16/2014] [Indexed: 11/07/2022]
Abstract
Allele-specific expression (ASE) studies have wide-ranging implications for genome biology and medicine. Whole transcriptome RNA sequencing (RNA-Seq) has emerged as a genome-wide tool for identifying ASE, but suffers from mapping bias favoring reference alleles. Two categories of methods are adopted nowadays, to reduce the effect of mapping bias on ASE identification-normalizing RNA allelic ratio with the parallel genomic allelic ratio (pDNAar) and modifying reference genome to make reads carrying both alleles with the same chance to be mapped (mREF). We compared the sensitivity and specificity of both methods with simulated data, and demonstrated that the pDNAar, though ideally practical, was lower in sensitivity, because of its lower mapping rate of reads carrying nonreference (alternative) alleles, although mREF achieved higher sensitivity and specificity for its efficiency in mapping reads carrying both alleles. Application of these two methods in real sequencing data showed that mREF were able to identify more ASE loci because of its higher mapping efficiency, and able to correcting some seemly incorrect ASE loci identified by pDNAar due to the inefficiency in mapping reads carrying alternative alleles of pDNAar. Our study provides useful information for RNA sequencing data processing in the identification of ASE.
Collapse
Affiliation(s)
- Zhi Liu
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academic of Science, Shanghai, P. R. China; University of Chinese Academic of Science, Beijing, P. R. China
| | | | | | | | | | | | | | | |
Collapse
|
36
|
Hybrid incompatibility arises in a sequence-based bioenergetic model of transcription factor binding. Genetics 2014; 198:1155-66. [PMID: 25173845 DOI: 10.1534/genetics.114.168112] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Postzygotic isolation between incipient species results from the accumulation of incompatibilities that arise as a consequence of genetic divergence. When phenotypes are determined by regulatory interactions, hybrid incompatibility can evolve even as a consequence of parallel adaptation in parental populations because interacting genes can produce the same phenotype through incompatible allelic combinations. We explore the evolutionary conditions that promote and constrain hybrid incompatibility in regulatory networks using a bioenergetic model (combining thermodynamics and kinetics) of transcriptional regulation, considering the bioenergetic basis of molecular interactions between transcription factors (TFs) and their binding sites. The bioenergetic parameters consider the free energy of formation of the bond between the TF and its binding site and the availability of TFs in the intracellular environment. Together these determine fractional occupancy of the TF on the promoter site, the degree of subsequent gene expression and in diploids, and the degree of dominance among allelic interactions. This results in a sigmoid genotype-phenotype map and fitness landscape, with the details of the shape determining the degree of bioenergetic evolutionary constraint on hybrid incompatibility. Using individual-based simulations, we subjected two allopatric populations to parallel directional or stabilizing selection. Misregulation of hybrid gene expression occurred under either type of selection, although it evolved faster under directional selection. Under directional selection, the extent of hybrid incompatibility increased with the slope of the genotype-phenotype map near the derived parental expression level. Under stabilizing selection, hybrid incompatibility arose from compensatory mutations and was greater when the bioenergetic properties of the interaction caused the space of nearly neutral genotypes around the stable expression level to be wide. F2's showed higher hybrid incompatibility than F1's to the extent that the bioenergetic properties favored dominant regulatory interactions. The present model is a mechanistically explicit case of the Bateson-Dobzhansky-Muller model, connecting environmental selective pressure to hybrid incompatibility through the molecular mechanism of regulatory divergence. The bioenergetic parameters that determine expression represent measurable properties of transcriptional regulation, providing a predictive framework for empirical studies of how phenotypic evolution results in epistatic incompatibility at the molecular level in hybrids.
Collapse
|
37
|
Wei KHC, Clark AG, Barbash DA. Limited gene misregulation is exacerbated by allele-specific upregulation in lethal hybrids between Drosophila melanogaster and Drosophila simulans. Mol Biol Evol 2014; 31:1767-78. [PMID: 24723419 PMCID: PMC4069615 DOI: 10.1093/molbev/msu127] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Misregulation of gene expression is often observed in interspecific hybrids and is generally attributed to regulatory incompatibilities caused by divergence between the two genomes. However, it has been challenging to distinguish effects of regulatory divergence from secondary effects including developmental and physiological defects common to hybrids. Here, we use RNA-Seq to profile gene expression in F1 hybrid male larvae from crosses of Drosophila melanogaster to its sibling species D. simulans. We analyze lethal and viable hybrid males, the latter produced using a mutation in the X-linked D. melanogaster Hybrid male rescue (Hmr) gene and compare them with their parental species and to public data sets of gene expression across development. We find that Hmr has drastically different effects on the parental and hybrid genomes, demonstrating that hybrid incompatibility genes can exhibit novel properties in the hybrid genetic background. Additionally, we find that D. melanogaster alleles are preferentially affected between lethal and viable hybrids. We further determine that many of the differences between the hybrids result from developmental delay in the Hmr(+) hybrids. Finally, we find surprisingly modest expression differences in hybrids when compared with the parents, with only 9% and 4% of genes deviating from additivity or expressed outside of the parental range, respectively. Most of these differences can be attributed to developmental delay and differences in tissue types. Overall, our study suggests that hybrid gene misexpression is prone to overestimation and that even between species separated by approximately 2.5 Ma, regulatory incompatibilities are not widespread in hybrids.
Collapse
Affiliation(s)
- Kevin H-C Wei
- Department of Molecular Biology and Genetics, Cornell University
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University
| | - Daniel A Barbash
- Department of Molecular Biology and Genetics, Cornell University
| |
Collapse
|
38
|
Quinn A, Juneja P, Jiggins FM. Estimates of allele-specific expression in Drosophila with a single genome sequence and RNA-seq data. ACTA ACUST UNITED AC 2014; 30:2603-10. [PMID: 24845654 DOI: 10.1093/bioinformatics/btu342] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
MOTIVATION Genetic variation in cis-regulatory elements is an important cause of variation in gene expression. Cis-regulatory variation can be detected by using high-throughput RNA sequencing (RNA-seq) to identify differences in the expression of the two alleles of a gene. This requires that reads from the two alleles are equally likely to map to a reference genome(s), and that single-nucleotide polymorphisms (SNPs) are accurately called, so that reads derived from the different alleles can be identified. Both of these prerequisites can be achieved by sequencing the genomes of the parents of the individual being studied, but this is often prohibitively costly. RESULTS In Drosophila, we demonstrate that biases during read mapping can be avoided by mapping reads to two alternative genomes that incorporate SNPs called from the RNA-seq data. The SNPs can be reliably called from the RNA-seq data itself, provided any variants not found in high-quality SNP databases are filtered out. Finally, we suggest a way of measuring allele-specific expression (ASE) by crossing the line of interest to a reference line with a high-quality genome sequence. Combined with our bioinformatic methods, this approach minimizes mapping biases, allows poor-quality data to be identified and removed and aides in the biological interpretation of the data as the parent of origin of each allele is known. In conclusion, our results suggest that accurate estimates of ASE do not require the parental genomes of the individual being studied to be sequenced. AVAILABILITY AND IMPLEMENTATION Scripts used to perform our analysis are available at https://github.com/d-quinn/bio_quinn2013.
Collapse
Affiliation(s)
- Andrew Quinn
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Punita Juneja
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Francis M Jiggins
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| |
Collapse
|
39
|
McManus CJ, May GE, Spealman P, Shteyman A. Ribosome profiling reveals post-transcriptional buffering of divergent gene expression in yeast. Genome Res 2014; 24:422-30. [PMID: 24318730 PMCID: PMC3941107 DOI: 10.1101/gr.164996.113] [Citation(s) in RCA: 156] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Accepted: 12/05/2013] [Indexed: 01/14/2023]
Abstract
Understanding the patterns and causes of phenotypic divergence is a central goal in evolutionary biology. Much work has shown that mRNA abundance is highly variable between closely related species. However, the extent and mechanisms of post-transcriptional gene regulatory evolution are largely unknown. Here we used ribosome profiling to compare transcript abundance and translation efficiency in two closely related yeast species (S. cerevisiae and S. paradoxus). By comparing translation regulatory divergence to interspecies differences in mRNA sequence features, we show that differences in transcript leaders and codon bias substantially contribute to divergent translation. Globally, we find that translation regulatory divergence often buffers species differences in mRNA abundance, such that ribosome occupancy is more conserved than transcript abundance. We used allele-specific ribosome profiling in interspecies hybrids to compare the relative contributions of cis- and trans-regulatory divergence to species differences in mRNA abundance and translation efficiency. The mode of gene regulatory divergence differs for these processes, as trans-regulatory changes play a greater role in divergent mRNA abundance than in divergent translation efficiency. Strikingly, most genes with aberrant transcript abundance in F1 hybrids (either over- or underexpressed compared to both parent species) did not exhibit aberrant ribosome occupancy. Our results show that interspecies differences in translation contribute substantially to the evolution of gene expression. Compensatory differences in transcript abundance and translation efficiency may increase the robustness of gene regulation.
Collapse
Affiliation(s)
- C. Joel McManus
- Carnegie Mellon University, Department of Biological Sciences, Pittsburgh, Pennsylvania 15213, USA
| | - Gemma E. May
- Carnegie Mellon University, Department of Biological Sciences, Pittsburgh, Pennsylvania 15213, USA
| | - Pieter Spealman
- Carnegie Mellon University, Department of Biological Sciences, Pittsburgh, Pennsylvania 15213, USA
| | - Alan Shteyman
- Carnegie Mellon University, Department of Biological Sciences, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
40
|
McManus CJ, Coolon JD, Eipper-Mains J, Wittkopp PJ, Graveley BR. Evolution of splicing regulatory networks in Drosophila. Genome Res 2014; 24:786-96. [PMID: 24515119 PMCID: PMC4009608 DOI: 10.1101/gr.161521.113] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The proteome expanding effects of alternative pre-mRNA splicing have had a profound impact on eukaryotic evolution. The events that create this diversity can be placed into four major classes: exon skipping, intron retention, alternative 5′ splice sites, and alternative 3′ splice sites. Although the regulatory mechanisms and evolutionary pressures among alternative splicing classes clearly differ, how these differences affect the evolution of splicing regulation remains poorly characterized. We used RNA-seq to investigate splicing differences in D. simulans, D. sechellia, and three strains of D. melanogaster. Regulation of exon skipping and tandem alternative 3′ splice sites (NAGNAGs) were more divergent than other splicing classes. Splicing regulation was most divergent in frame-preserving events and events in noncoding regions. We further determined the contributions of cis- and trans-acting changes in splicing regulatory networks by comparing allele-specific splicing in F1 interspecific hybrids, because differences in allele-specific splicing reflect changes in cis-regulatory element activity. We find that species-specific differences in intron retention and alternative splice site usage are primarily attributable to changes in cis-regulatory elements (median ∼80% cis), whereas species-specific exon skipping differences are driven by both cis- and trans-regulatory divergence (median ∼50% cis). These results help define the mechanisms and constraints that influence splicing regulatory evolution and show that networks regulating the four major classes of alternative splicing diverge through different genetic mechanisms. We propose a model in which differences in regulatory network architecture among classes of alternative splicing affect the evolution of splicing regulation.
Collapse
Affiliation(s)
- C Joel McManus
- Department of Genetics and Developmental Biology, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| | | | | | | | | |
Collapse
|
41
|
Zhao L, Saelao P, Jones CD, Begun DJ. Origin and spread of de novo genes in Drosophila melanogaster populations. Science 2014; 343:769-72. [PMID: 24457212 DOI: 10.1126/science.1248286] [Citation(s) in RCA: 182] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Comparative genomic analyses have revealed that genes may arise from ancestrally nongenic sequence. However, the origin and spread of these de novo genes within populations remain obscure. We identified 142 segregating and 106 fixed testis-expressed de novo genes in a population sample of Drosophila melanogaster. These genes appear to derive primarily from ancestral intergenic, unexpressed open reading frames, with natural selection playing a significant role in their spread. These results reveal a heretofore unappreciated dynamism of gene content.
Collapse
Affiliation(s)
- Li Zhao
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | | | | | | |
Collapse
|
42
|
Dalton JE, Fear JM, Knott S, Baker BS, McIntyre LM, Arbeitman MN. Male-specific Fruitless isoforms have different regulatory roles conferred by distinct zinc finger DNA binding domains. BMC Genomics 2013; 14:659. [PMID: 24074028 PMCID: PMC3852243 DOI: 10.1186/1471-2164-14-659] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/20/2013] [Indexed: 11/25/2022] Open
Abstract
Background Drosophila melanogaster adult males perform an elaborate courtship ritual to entice females to mate. fruitless (fru), a gene that is one of the key regulators of male courtship behavior, encodes multiple male-specific isoforms (FruM). These isoforms vary in their carboxy-terminal zinc finger domains, which are predicted to facilitate DNA binding. Results By over-expressing individual FruM isoforms in fru-expressing neurons in either males or females and assaying the global transcriptional response by RNA-sequencing, we show that three FruM isoforms have different regulatory activities that depend on the sex of the fly. We identified several sets of genes regulated downstream of FruM isoforms, including many annotated with neuronal functions. By determining the binding sites of individual FruM isoforms using SELEX we demonstrate that the distinct zinc finger domain of each FruM isoforms confers different DNA binding specificities. A genome-wide search for these binding site sequences finds that the gene sets identified as induced by over-expression of FruM isoforms in males are enriched for genes that contain the binding sites. An analysis of the chromosomal distribution of genes downstream of FruM shows that those that are induced and repressed in males are highly enriched and depleted on the X chromosome, respectively. Conclusions This study elucidates the different regulatory and DNA binding activities of three FruM isoforms on a genome-wide scale and identifies genes regulated by these isoforms. These results add to our understanding of sex chromosome biology and further support the hypothesis that in some cell-types genes with male-biased expression are enriched on the X chromosome.
Collapse
Affiliation(s)
- Justin E Dalton
- Biomedical Sciences Department and Program in Neuroscience, Florida State University, College of Medicine, Tallahassee, FL 32303, USA.
| | | | | | | | | | | |
Collapse
|
43
|
Stevenson KR, Coolon JD, Wittkopp PJ. Sources of bias in measures of allele-specific expression derived from RNA-sequence data aligned to a single reference genome. BMC Genomics 2013; 14:536. [PMID: 23919664 PMCID: PMC3751238 DOI: 10.1186/1471-2164-14-536] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2013] [Accepted: 08/05/2013] [Indexed: 11/23/2022] Open
Abstract
Background RNA-seq can be used to measure allele-specific expression (ASE) by assigning sequence reads to individual alleles; however, relative ASE is systematically biased when sequence reads are aligned to a single reference genome. Aligning sequence reads to both parental genomes can eliminate this bias, but this approach is not always practical, especially for non-model organisms. To improve accuracy of ASE measured using a single reference genome, we identified properties of differentiating sites responsible for biased measures of relative ASE. Results We found that clusters of differentiating sites prevented sequence reads from an alternate allele from aligning to the reference genome, causing a bias in relative ASE favoring the reference allele. This bias increased with greater sequence divergence between alleles. Increasing the number of mismatches allowed when aligning sequence reads to the reference genome and restricting analysis to genomic regions with fewer differentiating sites than the number of mismatches allowed almost completely eliminated this systematic bias. Accuracy of allelic abundance was increased further by excluding differentiating sites within sequence reads that could not be aligned uniquely within the genome (imperfect mappability) and reads that overlapped one or more insertions or deletions (indels) between alleles. Conclusions After aligning sequence reads to a single reference genome, excluding differentiating sites with at least as many neighboring differentiating sites as the number of mismatches allowed, imperfect mappability, and/or an indel(s) nearby resulted in measures of allelic abundance comparable to those derived from aligning sequence reads to both parental genomes.
Collapse
Affiliation(s)
- Kraig R Stevenson
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | | | | |
Collapse
|
44
|
Pandey RV, Franssen SU, Futschik A, Schlötterer C. Allelic imbalance metre (Allim), a new tool for measuring allele-specific gene expression with RNA-seq data. Mol Ecol Resour 2013; 13:740-5. [PMID: 23615333 PMCID: PMC3739924 DOI: 10.1111/1755-0998.12110] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Revised: 03/18/2013] [Accepted: 03/22/2013] [Indexed: 11/29/2022]
Abstract
Estimating differences in gene expression among alleles is of high interest for many areas in biology and medicine. Here, we present a user-friendly software tool, Allim, to estimate allele-specific gene expression. Because mapping bias is a major problem for reliable estimates of allele-specific gene expression using RNA-seq, Allim combines two different strategies to account for the mapping biases. In order to reduce the mapping bias, Allim first generates a polymorphism-aware reference genome that accounts for the sequence variation between the alleles. Then, a sequence-specific simulation tool estimates the residual mapping bias. Statistical tests for allelic imbalance are provided that can be used with the bias corrected RNA-seq data.
Collapse
Affiliation(s)
- Ram Vinay Pandey
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | | | | | | |
Collapse
|
45
|
Gaur U, Li K, Mei S, Liu G. Research progress in allele-specific expression and its regulatory mechanisms. J Appl Genet 2013; 54:271-83. [PMID: 23609142 DOI: 10.1007/s13353-013-0148-y] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Revised: 03/22/2013] [Accepted: 04/03/2013] [Indexed: 12/12/2022]
Abstract
Although the majority of genes are expressed equally from both alleles, some genes are differentially expressed. Organisms possess characteristics to preferentially express a particular allele under regulatory factors, which is termed allele-specific expression (ASE). It is one of the important genetic factors that lead to phenotypic variation and can be used to identify the variance of gene regulation factors. ASE indicates mechanisms such as DNA methylation, histone modifications, and non-coding RNAs function. Here, we review a broad survey of progress in ASE studies, and what this simple yet very effective approach can offer in functional genomics, and possible implications toward our better understanding of the underlying mechanisms of complex traits.
Collapse
Affiliation(s)
- Uma Gaur
- Institute of Animal Science and Veterinary Medicine, Hubei Academy of Agricultural Sciences, Yaoyuan No. 1, Nanhu, Hongshan District, Wuhan, 430064, People's Republic of China
| | | | | | | |
Collapse
|
46
|
Abstract
Rising atmospheric carbon dioxide (CO2) conditions are driving unprecedented changes in seawater chemistry, resulting in reduced pH and carbonate ion concentrations in the Earth's oceans. This ocean acidification has negative but variable impacts on individual performance in many marine species. However, little is known about the adaptive capacity of species to respond to an acidified ocean, and, as a result, predictions regarding future ecosystem responses remain incomplete. Here we demonstrate that ocean acidification generates striking patterns of genome-wide selection in purple sea urchins (Strongylocentrotus purpuratus) cultured under different CO2 levels. We examined genetic change at 19,493 loci in larvae from seven adult populations cultured under realistic future CO2 levels. Although larval development and morphology showed little response to elevated CO2, we found substantial allelic change in 40 functional classes of proteins involving hundreds of loci. Pronounced genetic changes, including excess amino acid replacements, were detected in all populations and occurred in genes for biomineralization, lipid metabolism, and ion homeostasis--gene classes that build skeletons and interact in pH regulation. Such genetic change represents a neglected and important impact of ocean acidification that may influence populations that show few outward signs of response to acidification. Our results demonstrate the capacity for rapid evolution in the face of ocean acidification and show that standing genetic variation could be a reservoir of resilience to climate change in this coastal upwelling ecosystem. However, effective response to strong natural selection demands large population sizes and may be limited in species impacted by other environmental stressors.
Collapse
|
47
|
Innocenti P, Chenoweth SF. Interspecific divergence of transcription networks along lines of genetic variance in Drosophila: dimensionality, evolvability, and constraint. Mol Biol Evol 2013; 30:1358-67. [PMID: 23519314 DOI: 10.1093/molbev/mst047] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Change in gene expression is a major facilitator of phenotypic evolution. Understanding the evolutionary potential of gene expression requires taking into account complex systems of regulatory networks, the structure of which could potentially bias evolutionary trajectories. We analyzed the evolutionary potential and divergence of multigene expression in three well-characterized signaling pathways in Drosophila, the mitogen-activated protein kinase (MapK), the Toll, and the insulin receptor/Foxo (InR/Foxo or InR/TOR) pathways in a multivariate quantitative genetic framework. Gene expression data from a natural population of D. melanogaster were used to estimate the genetic variance-covariance matrices (G) for each network. Although most genes within each pathway exhibited significant genetic variance, the number of independent dimensions of multivariate genetic variance was fewer than the number of genes analyzed. However, for expression, the reduction in dimensionality was not as large as seen for other trait types such as morphology. We then tested whether gene expression divergence between D. melanogaster and an additional six species of the Drosophila genus was biased along the major axes of standing variation observed in D. melanogaster. In many cases, divergence was restricted to directions of phenotypic space harboring above average levels of genetic variance in D. melanogaster, indicating that genetic covariances between genes within pathways have biased interspecific divergence. We tested whether co-expression of genes in both sexes has also biased the pattern of divergence. Including cross-sex genetic covariances increased the degree to which divergence was biased along major axes of genetic variance, suggesting that the co-expression of genes in males and females can generate further constraints on divergence across the Drosophila phylogeny. In contrast to patterns seen for morphological traits in vertebrates, transcriptional constraints do not appear to break down as divergence time between species increases, instead they persist over tens of millions of years of divergence.
Collapse
Affiliation(s)
- Paolo Innocenti
- Department of Ecology and Genetics, Evolutionary Biology Center, Uppsala University, Uppsala, Sweden
| | | |
Collapse
|
48
|
Jensen K, Sanchez-Garcia J, Williams C, Khare S, Mathur K, Graze RM, Hahn DA, McIntyre LM, Rincon-Limas DE, Fernandez-Funez P. Purification of transcripts and metabolites from Drosophila heads. J Vis Exp 2013:e50245. [PMID: 23524378 PMCID: PMC3639516 DOI: 10.3791/50245] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
For the last decade, we have tried to understand the molecular and cellular mechanisms of neuronal degeneration using Drosophila as a model organism. Although fruit flies provide obvious experimental advantages, research on neurodegenerative diseases has mostly relied on traditional techniques, including genetic interaction, histology, immunofluorescence, and protein biochemistry. These techniques are effective for mechanistic, hypothesis-driven studies, which lead to a detailed understanding of the role of single genes in well-defined biological problems. However, neurodegenerative diseases are highly complex and affect multiple cellular organelles and processes over time. The advent of new technologies and the omics age provides a unique opportunity to understand the global cellular perturbations underlying complex diseases. Flexible model organisms such as Drosophila are ideal for adapting these new technologies because of their strong annotation and high tractability. One challenge with these small animals, though, is the purification of enough informational molecules (DNA, mRNA, protein, metabolites) from highly relevant tissues such as fly brains. Other challenges consist of collecting large numbers of flies for experimental replicates (critical for statistical robustness) and developing consistent procedures for the purification of high-quality biological material. Here, we describe the procedures for collecting thousands of fly heads and the extraction of transcripts and metabolites to understand how global changes in gene expression and metabolism contribute to neurodegenerative diseases. These procedures are easily scalable and can be applied to the study of proteomic and epigenomic contributions to disease.
Collapse
Affiliation(s)
- Kurt Jensen
- Department of Neurology, McKnight Brain Institute, University of Florida, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Wei Y, Li X, Wang QF, Ji H. iASeq: integrative analysis of allele-specificity of protein-DNA interactions in multiple ChIP-seq datasets. BMC Genomics 2012. [PMID: 23194258 PMCID: PMC3576346 DOI: 10.1186/1471-2164-13-681] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND ChIP-seq provides new opportunities to study allele-specific protein-DNA binding (ASB). However, detecting allelic imbalance from a single ChIP-seq dataset often has low statistical power since only sequence reads mapped to heterozygote SNPs are informative for discriminating two alleles. RESULTS We develop a new method iASeq to address this issue by jointly analyzing multiple ChIP-seq datasets. iASeq uses a Bayesian hierarchical mixture model to learn correlation patterns of allele-specificity among multiple proteins. Using the discovered correlation patterns, the model allows one to borrow information across datasets to improve detection of allelic imbalance. Application of iASeq to 77 ChIP-seq samples from 40 ENCODE datasets and 1 genomic DNA sample in GM12878 cells reveals that allele-specificity of multiple proteins are highly correlated, and demonstrates the ability of iASeq to improve allelic inference compared to analyzing each individual dataset separately. CONCLUSIONS iASeq illustrates the value of integrating multiple datasets in the allele-specificity inference and offers a new tool to better analyze ASB.
Collapse
Affiliation(s)
- Yingying Wei
- Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, 615 North Wolfe StreetBaltimore, Maryland 21205, USA
| | | | | | | |
Collapse
|
50
|
Genomic imprinting absent in Drosophila melanogaster adult females. Cell Rep 2012; 2:69-75. [PMID: 22840398 DOI: 10.1016/j.celrep.2012.06.013] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2012] [Revised: 04/27/2012] [Accepted: 06/12/2012] [Indexed: 12/15/2022] Open
Abstract
Genomic imprinting occurs when expression of an allele differs based on the sex of the parent that transmitted the allele. In D. melanogaster, imprinting can occur, but its impact on allelic expression genome-wide is unclear. Here, we search for imprinted genes in D. melanogaster using RNA-seq to compare allele-specific expression between pools of 7- to 10-day-old adult female progeny from reciprocal crosses. We identified 119 genes with allelic expression consistent with imprinting, and these genes showed significant clustering within the genome. Surprisingly, additional analysis of several of these genes showed that either genomic heterogeneity or high levels of intrinsic noise caused imprinting-like allelic expression. Consequently, our data provide no convincing evidence of imprinting for D. melanogaster genes in their native genomic context. Elucidating sources of false-positive signals for imprinting in allele-specific RNA-seq data, as done here, is critical given the growing popularity of this method for identifying imprinted genes.
Collapse
|