1
|
Early immune factors associated with the development of post-acute sequelae of SARS-CoV-2 infection in hospitalized and non-hospitalized individuals. Front Immunol 2024; 15:1348041. [PMID: 38318183 PMCID: PMC10838987 DOI: 10.3389/fimmu.2024.1348041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 01/02/2024] [Indexed: 02/07/2024] Open
Abstract
Background Infection by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) can lead to post-acute sequelae of SARS-CoV-2 (PASC) that can persist for weeks to years following initial viral infection. Clinical manifestations of PASC are heterogeneous and often involve multiple organs. While many hypotheses have been made on the mechanisms of PASC and its associated symptoms, the acute biological drivers of PASC are still unknown. Methods We enrolled 494 patients with COVID-19 at their initial presentation to a hospital or clinic and followed them longitudinally to determine their development of PASC. From 341 patients, we conducted multi-omic profiling on peripheral blood samples collected shortly after study enrollment to investigate early immune signatures associated with the development of PASC. Results During the first week of COVID-19, we observed a large number of differences in the immune profile of individuals who were hospitalized for COVID-19 compared to those individuals with COVID-19 who were not hospitalized. Differences between individuals who did or did not later develop PASC were, in comparison, more limited, but included significant differences in autoantibodies and in epigenetic and transcriptional signatures in double-negative 1 B cells, in particular. Conclusions We found that early immune indicators of incident PASC were nuanced, with significant molecular signals manifesting predominantly in double-negative B cells, compared with the robust differences associated with hospitalization during acute COVID-19. The emerging acute differences in B cell phenotypes, especially in double-negative 1 B cells, in PASC patients highlight a potentially important role of these cells in the development of PASC.
Collapse
|
2
|
Multi-omic profiling reveals early immunological indicators for identifying COVID-19 Progressors. Clin Immunol 2023; 256:109808. [PMID: 37852344 DOI: 10.1016/j.clim.2023.109808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/25/2023] [Accepted: 10/11/2023] [Indexed: 10/20/2023]
Abstract
We sought to better understand the immune response during the immediate post-diagnosis phase of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) by identifying molecular associations with longitudinal disease outcomes. Multi-omic analyses identified differences in immune cell composition, cytokine levels, and cell subset-specific transcriptomic and epigenomic signatures between individuals on a more serious disease trajectory (Progressors) as compared to those on a milder course (Non-progressors). Higher levels of multiple cytokines were observed in Progressors, with IL-6 showing the largest difference. Blood monocyte cell subsets were also skewed, showing a comparative decrease in non-classical CD14-CD16+ and intermediate CD14+CD16+ monocytes. In lymphocytes, the CD8+ T effector memory cells displayed a gene expression signature consistent with stronger T cell activation in Progressors. These early stage observations could serve as the basis for the development of prognostic biomarkers of disease risk and interventional strategies to improve the management of severe COVID-19. BACKGROUND: Much of the literature on immune response post-SARS-CoV-2 infection has been in the acute and post-acute phases of infection. TRANSLATIONAL SIGNIFICANCE: We found differences at early time points of infection in approximately 160 participants. We compared multi-omic signatures in immune cells between individuals progressing to needing more significant medical intervention and non-progressors. We observed widespread evidence of a state of increased inflammation associated with progression, supported by a range of epigenomic, transcriptomic, and proteomic signatures. The signatures we identified support other findings at later time points and serve as the basis for prognostic biomarker development or to inform interventional strategies.
Collapse
|
3
|
Multi-omic Profiling Reveals Early Immunological Indicators for Identifying COVID-19 Progressors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.25.542297. [PMID: 37292797 PMCID: PMC10246026 DOI: 10.1101/2023.05.25.542297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to a rapid response by the scientific community to further understand and combat its associated pathologic etiology. A focal point has been on the immune responses mounted during the acute and post-acute phases of infection, but the immediate post-diagnosis phase remains relatively understudied. We sought to better understand the immediate post-diagnosis phase by collecting blood from study participants soon after a positive test and identifying molecular associations with longitudinal disease outcomes. Multi-omic analyses identified differences in immune cell composition, cytokine levels, and cell subset-specific transcriptomic and epigenomic signatures between individuals on a more serious disease trajectory (Progressors) as compared to those on a milder course (Non-progressors). Higher levels of multiple cytokines were observed in Progressors, with IL-6 showing the largest difference. Blood monocyte cell subsets were also skewed, showing a comparative decrease in non-classical CD14-CD16+ and intermediate CD14+CD16+ monocytes. Additionally, in the lymphocyte compartment, CD8+ T effector memory cells displayed a gene expression signature consistent with stronger T cell activation in Progressors. Importantly, the identification of these cellular and molecular immune changes occurred at the early stages of COVID-19 disease. These observations could serve as the basis for the development of prognostic biomarkers of disease risk and interventional strategies to improve the management of severe COVID-19.
Collapse
|
4
|
Dissection of multiple sclerosis genetics identifies B and CD4+ T cells as driver cell subsets. Genome Biol 2022; 23:127. [PMID: 35672799 PMCID: PMC9175345 DOI: 10.1186/s13059-022-02694-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 05/16/2022] [Indexed: 11/10/2022] Open
Abstract
Background Multiple sclerosis (MS) is an autoimmune condition of the central nervous system with a well-characterized genetic background. Prior analyses of MS genetics have identified broad enrichments across peripheral immune cells, yet the driver immune subsets are unclear. Results We utilize chromatin accessibility data across hematopoietic cells to identify cell type-specific enrichments of MS genetic signals. We find that CD4 T and B cells are independently enriched for MS genetics and further refine the driver subsets to Th17 and memory B cells, respectively. We replicate our findings in data from untreated and treated MS patients and find that immunomodulatory treatments suppress chromatin accessibility at driver cell types. Integration of statistical fine-mapping and chromatin interactions nominate numerous putative causal genes, illustrating complex interplay between shared and cell-specific genes. Conclusions Overall, our study finds that open chromatin regions in CD4 T cells and B cells independently drive MS genetic signals. Our study highlights how careful integration of genetics and epigenetics can provide fine-scale insights into causal cell types and nominate new genes and pathways for disease. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02694-y.
Collapse
|
5
|
OP0100 MOLECULAR PROFILING OF PERIPHERAL IMMUNE CELL SUBSETS IN PATIENTS WITH RHEUMATOID ARTHRITIS. Ann Rheum Dis 2020. [DOI: 10.1136/annrheumdis-2020-eular.3967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Background:Rheumatoid arthritis (RA) is a chronic systemic autoimmune disease that affects 1% of the world’s population. Several key biological functions are dysregulated in RA, manifesting clinically as pain, fatigue, and synovitis, with articular destruction, organ-based comorbidities, and functional decline. Defining immune dysregulation in the peripheral blood of patients (pts) with RA will help inform future work to assess the extent to which immune homeostasis can be therapeutically achieved for these pts.Objectives:To identify baseline molecular characteristics of the peripheral immune system, at the level of individual immune cell subsets, in pts with RA recruited to clinical trials of the oral, selective Janus kinase 1 (JAK1) inhibitor, filgotinib.Methods:Peripheral blood mononuclear cells (PBMC) were collected from 324 pts with moderate to severely active RA, who had an inadequate response to methotrexate ([MTX], FINCH-1;NCT02889796; n=109) or who were MTX naïve (FINCH-3;NCT02886728; n=215). PBMC were also collected from 50 demographically matched healthy volunteers (HV). The Immune Profiler platform was used to sort PBMC into 24 immune cell subsets, then quantify their gene expression and chromatin accessibility using RNA-seq and the assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq), respectively. Differentially expressed genes (DEGs) and differentially accessible regions (DARs) were identified among immune cell subsets from pts with RA versus HV. Gene set signature scores of Molecular Signatures Database hallmark pathways were calculated using single sample gene set enrichment analysis (ssGSEA) to examine differences in pathway activity between groups.Results:A total of 14,500 sequencing datasets were generated from the pt and HV immune cell subsets. Among these, over 26,000 DEGs and 220,000 DARs were identified in RA versus HV (false discovery rate <0.05) across the 24 immune cell subsets. DEGs were identified in all immune cell subsets tested and were most pronounced in natural killer (NK) subsets; most DARs were detected in myeloid and NK subsets. ssGSEA revealed differential pathway signaling in RA versus HV across multiple functions at the immune cell subset level. Myeloid subsets from pts with RA often showed elevated pathway activities versus HV whereas B, T and NK subsets showed a general decrease. In particular, monocyte populations from pts with RA versus HV had elevated pathway activities involved in inflammatory response and interleukin-6/Janus kinase/signal transducer and activator of transcription 3 signaling. The B, T and NK subsets showed a general decrease in tumor necrosis factor-α signaling; conversely, monocyte subsets showed an increase. Prior MTX exposure did not have a notable impact on the detected molecular profile.Conclusion:Differences in gene expression, hallmark pathway activity, and chromatin accessibility were identified in RA versus HV at the immune cell subset level. Significant contributions to differences in chromatin accessibility identified in the myeloid and NK cell populations suggest that there are more active regulatory sequences in these cell types that are associated with RA. Further investigations based on these findings may increase understanding of the immune regulatory paradigm in the context of RA.Acknowledgments:This study was funded by Gilead Sciences, Inc. Editorial support was provided by Fishawack Communications Inc and funded by Gilead Sciences, Inc.Disclosure of Interests:Peter C. Taylor Grant/research support from: Celgene, Eli Lilly and Company, Galapagos, and Gilead, Consultant of: AbbVie, Biogen, Eli Lilly and Company, Fresenius, Galapagos, Gilead, GlaxoSmithKline, Janssen, Nordic Pharma, Pfizer Roche, and UCB, Jinfeng Liu Shareholder of: Gilead Sciences Inc., Roche, Employee of: Gilead Sciences Inc., Luting Zhuo Employee of: Gilead Sciences Inc., Yuan Tian Employee of: Gilead Sciences Inc., Thomas Snyder Employee of: Verily Life Sciences, Charlie Kim Employee of: Verily Life Sciences, Pouya Kheradpour Employee of: Verily Life Sciences, Kat Drake Employee of: Verily Life Sciences, Sam Kim Shareholder of: Gilead Sciences Inc., Employee of: Gilead Sciences Inc., Rachael E. Hawtin Shareholder of: Gilead Sciences Inc., Employee of: Gilead Sciences Inc.
Collapse
|
6
|
Evidence of reduced recombination rate in human regulatory domains. Genome Biol 2017; 18:193. [PMID: 29058599 PMCID: PMC5651596 DOI: 10.1186/s13059-017-1308-x] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 08/25/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recombination rate is non-uniformly distributed across the human genome. The variation of recombination rate at both fine and large scales cannot be fully explained by DNA sequences alone. Epigenetic factors, particularly DNA methylation, have recently been proposed to influence the variation in recombination rate. RESULTS We study the relationship between recombination rate and gene regulatory domains, defined by a gene and its linked control elements. We define these links using expression quantitative trait loci (eQTLs), methylation quantitative trait loci (meQTLs), chromatin conformation from publicly available datasets (Hi-C and ChIA-PET), and correlated activity links that we infer across cell types. Each link type shows a "recombination rate valley" of significantly reduced recombination rate compared to matched control regions. This recombination rate valley is most pronounced for gene regulatory domains of early embryonic development genes, housekeeping genes, and constitutive regulatory elements, which are known to show increased evolutionary constraint across species. Recombination rate valleys show increased DNA methylation, reduced doublestranded break initiation, and increased repair efficiency, specifically in the lineage leading to the germ line. Moreover, by using only the overlap of functional links and DNA methylation in germ cells, we are able to predict the recombination rate with high accuracy. CONCLUSIONS Our results suggest the existence of a recombination rate valley at regulatory domains and provide a potential molecular mechanism to interpret the interplay between genetic and epigenetic variations.
Collapse
|
7
|
Abstract
Annotation of regulatory elements and identification of the transcription-related factors (TRFs) targeting these elements are key steps in understanding how cells interpret their genetic blueprint and their environment during development, and how that process goes awry in the case of disease. One goal of the modENCODE (model organism ENCyclopedia of DNA Elements) Project is to survey a diverse sampling of TRFs, both DNA-binding and non-DNA-binding factors, to provide a framework for the subsequent study of the mechanisms by which transcriptional regulators target the genome. Here we provide an updated map of the Drosophila melanogaster regulatory genome based on the location of 84 TRFs at various stages of development. This regulatory map reveals a variety of genomic targeting patterns, including factors with strong preferences toward proximal promoter binding, factors that target intergenic and intronic DNA, and factors with distinct chromatin state preferences. The data also highlight the stringency of the Polycomb regulatory network, and show association of the Trithorax-like (Trl) protein with hotspots of DNA binding throughout development. Furthermore, the data identify more than 5800 instances in which TRFs target DNA regions with demonstrated enhancer activity. Regions of high TRF co-occupancy are more likely to be associated with open enhancers used across cell types, while lower TRF occupancy regions are associated with complex enhancers that are also regulated at the epigenetic level. Together these data serve as a resource for the research community in the continued effort to dissect transcriptional regulatory mechanisms directing Drosophila development.
Collapse
|
8
|
Abstract
The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.
Collapse
|
9
|
Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res 2013; 42:2976-87. [PMID: 24335146 PMCID: PMC3950668 DOI: 10.1093/nar/gkt1249] [Citation(s) in RCA: 304] [Impact Index Per Article: 27.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Recent advances in technology have led to a dramatic increase in the number of available transcription factor ChIP-seq and ChIP-chip data sets. Understanding the motif content of these data sets is an important step in understanding the underlying mechanisms of regulation. Here we provide a systematic motif analysis for 427 human ChIP-seq data sets using motifs curated from the literature and also discovered de novo using five established motif discovery tools. We use a systematic pipeline for calculating motif enrichment in each data set, providing a principled way for choosing between motif variants found in the literature and for flagging potentially problematic data sets. Our analysis confirms the known specificity of 41 of the 56 analyzed factor groups and reveals motifs of potential cofactors. We also use cell type-specific binding to find factors active in specific conditions. The resource we provide is accessible both for browsing a small number of factors and for performing large-scale systematic analyses. We provide motif matrices, instances and enrichments in each of the ENCODE data sets. The motifs discovered here have been used in parallel studies to validate the specificity of antibodies, understand cooperativity between data sets and measure the variation of motif binding across individuals and species.
Collapse
|
10
|
Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res 2013; 23:800-11. [PMID: 23512712 PMCID: PMC3638136 DOI: 10.1101/gr.144899.112] [Citation(s) in RCA: 228] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Accepted: 03/14/2013] [Indexed: 01/06/2023]
Abstract
Genome-wide chromatin annotations have permitted the mapping of putative regulatory elements across multiple human cell types. However, their experimental dissection by directed regulatory motif disruption has remained unfeasible at the genome scale. Here, we use a massively parallel reporter assay (MPRA) to measure the transcriptional levels induced by 145-bp DNA segments centered on evolutionarily conserved regulatory motif instances within enhancer chromatin states. We select five predicted activators (HNF1, HNF4, FOXA, GATA, NFE2L2) and two predicted repressors (GFI1, ZFP161) and measure reporter expression in erythroleukemia (K562) and liver carcinoma (HepG2) cell lines. We test 2104 wild-type sequences and 3314 engineered enhancer variants containing targeted motif disruptions, each using 10 barcode tags and two replicates. The resulting data strongly confirm the enhancer activity and cell-type specificity of enhancer chromatin states, the ability of 145-bp segments to recapitulate both, the necessary role of regulatory motifs in enhancer function, and the complementary roles of activator and repressor motifs. We find statistically robust evidence that (1) disrupting the predicted activator motifs abolishes enhancer function, while silent or motif-improving changes maintain enhancer activity; (2) evolutionary conservation, nucleosome exclusion, binding of other factors, and strength of the motif match are predictive of enhancer activity; (3) scrambling repressor motifs leads to aberrant reporter expression in cell lines where the enhancers are usually inactive. Our results suggest a general strategy for deciphering cis-regulatory elements by systematic large-scale manipulation and provide quantitative enhancer activity measurements across thousands of constructs that can be mined to develop predictive models of gene expression.
Collapse
|
11
|
Abstract
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
Collapse
|
12
|
Analysis of variation at transcription factor binding sites in Drosophila and humans. Genome Biol 2012; 13:R49. [PMID: 22950968 PMCID: PMC3491393 DOI: 10.1186/gb-2012-13-9-r49] [Citation(s) in RCA: 77] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2012] [Revised: 05/23/2012] [Accepted: 06/08/2012] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Advances in sequencing technology have boosted population genomics and made it possible to map the positions of transcription factor binding sites (TFBSs) with high precision. Here we investigate TFBS variability by combining transcription factor binding maps generated by ENCODE, modENCODE, our previously published data and other sources with genomic variation data for human individuals and Drosophila isogenic lines. RESULTS We introduce a metric of TFBS variability that takes into account changes in motif match associated with mutation and makes it possible to investigate TFBS functional constraints instance-by-instance as well as in sets that share common biological properties. We also take advantage of the emerging per-individual transcription factor binding data to show evidence that TFBS mutations, particularly at evolutionarily conserved sites, can be efficiently buffered to ensure coherent levels of transcription factor binding. CONCLUSIONS Our analyses provide insights into the relationship between individual and interspecies variation and show evidence for the functional buffering of TFBS mutations in both humans and flies. In a broad perspective, these results demonstrate the potential of combining functional genomics and population genetics approaches for understanding gene regulation.
Collapse
|
13
|
Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. Genome Res 2011; 21:1916-28. [PMID: 21994248 DOI: 10.1101/gr.108753.110] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes--especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ∼2% of their synonymous sites. We collect numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape.
Collapse
|
14
|
An epigenetic signature for monoallelic olfactory receptor expression. Cell 2011; 145:555-70. [PMID: 21529909 PMCID: PMC3094500 DOI: 10.1016/j.cell.2011.03.040] [Citation(s) in RCA: 206] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2010] [Revised: 03/10/2011] [Accepted: 03/17/2011] [Indexed: 12/29/2022]
Abstract
Constitutive heterochromatin is traditionally viewed as the static form of heterochromatin that silences pericentromeric and telomeric repeats in a cell cycle- and differentiation-independent manner. Here, we show that, in the mouse olfactory epithelium, olfactory receptor (OR) genes are marked in a highly dynamic fashion with the molecular hallmarks of constitutive heterochromatin, H3K9me3 and H4K20me3. The cell type and developmentally dependent deposition of these marks along the OR clusters are, most likely, reversed during the process of OR choice to allow for monogenic and monoallelic OR expression. In contrast to the current view of OR choice, our data suggest that OR silencing takes place before OR expression, indicating that it is not the product of an OR-elicited feedback signal. Our findings suggest that chromatin-mediated silencing lays a molecular foundation upon which singular and stochastic selection for gene expression can be applied.
Collapse
|
15
|
Abstract
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
Collapse
|
16
|
The Tasmanian devil transcriptome reveals Schwann cell origins of a clonally transmissible cancer. Science 2010; 327:84-7. [PMID: 20044575 DOI: 10.1126/science.1180616] [Citation(s) in RCA: 191] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The Tasmanian devil, a marsupial carnivore, is endangered because of the emergence of a transmissible cancer known as devil facial tumor disease (DFTD). This fatal cancer is clonally derived and is an allograft transmitted between devils by biting. We performed a large-scale genetic analysis of DFTD with microsatellite genotyping, a mitochondrial genome analysis, and deep sequencing of the DFTD transcriptome and microRNAs. These studies confirm that DFTD is a monophyletic clonally transmissible tumor and suggest that the disease is of Schwann cell origin. On the basis of these results, we have generated a diagnostic marker for DFTD and identify a suite of genes relevant to DFTD pathology and transmission. We provide a genomic data set for the Tasmanian devil that is applicable to cancer diagnosis, disease evolution, and conservation biology.
Collapse
|
17
|
A comprehensive map of insulator elements for the Drosophila genome. PLoS Genet 2010; 6:e1000814. [PMID: 20084099 PMCID: PMC2797089 DOI: 10.1371/journal.pgen.1000814] [Citation(s) in RCA: 257] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2009] [Accepted: 12/14/2009] [Indexed: 01/31/2023] Open
Abstract
Insulators are DNA sequences that control the interactions among genomic regulatory elements and act as chromatin boundaries. A thorough understanding of their location and function is necessary to address the complexities of metazoan gene regulation. We studied by ChIP–chip the genome-wide binding sites of 6 insulator-associated proteins—dCTCF, CP190, BEAF-32, Su(Hw), Mod(mdg4), and GAF—to obtain the first comprehensive map of insulator elements in Drosophila embryos. We identify over 14,000 putative insulators, including all classically defined insulators. We find two major classes of insulators defined by dCTCF/CP190/BEAF-32 and Su(Hw), respectively. Distributional analyses of insulators revealed that particular sub-classes of insulator elements are excluded between cis-regulatory elements and their target promoters; divide differentially expressed, alternative, and divergent promoters; act as chromatin boundaries; are associated with chromosomal breakpoints among species; and are embedded within active chromatin domains. Together, these results provide a map demarcating the boundaries of gene regulatory units and a framework for understanding insulator function during the development and evolution of Drosophila. The spatiotemporal specificity of gene expression is controlled by interactions among regulatory proteins, cis-regulatory elements, chromatin modifications, and genes. These interactions can occur over large distances, and the mechanisms by which they are controlled are poorly understood. Insulators are DNA sequences that can both block the interaction between regulatory elements and genes, as well as block the spread of regions of modified chromatin. To date, relatively few insulators have been identified in developing Drosophila embryos. We here present the genome wide identification of over 14,000 binding sites for 6 insulator-associated proteins. We demonstrate the existence of two broad classes of insulators. Insulators of both classes are enriched at the boundaries of a particular chromatin modification. However, only insulators bound by BEAF-32, CP190, and dCTCF are enriched in regions of open chromatin or demarcate gene boundaries, with a particular enrichment between differentially expressed promoters. Furthermore, insulators of this class are enriched at points of chromosomal rearrangement among the 12 species of sequenced Drosophila, suggesting that insulator defined regulatory boundaries are evolutionarily conserved.
Collapse
|
18
|
Genome analysis of the platypus reveals unique signatures of evolution. Nature 2008; 453:175-83. [PMID: 18464734 PMCID: PMC2803040 DOI: 10.1038/nature06936] [Citation(s) in RCA: 475] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2007] [Accepted: 03/25/2008] [Indexed: 12/18/2022]
Abstract
We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.
Collapse
|
19
|
Abstract
Small RNA pathways play evolutionarily conserved roles in gene regulation and defense from parasitic nucleic acids. The character and expression patterns of small RNAs show conservation throughout animal lineages, but specific animal clades also show variations on these recurring themes, including species-specific small RNAs. The monotremes, with only platypus and four species of echidna as extant members, represent the basal branch of the mammalian lineage. Here, we examine the small RNA pathways of monotremes by deep sequencing of six platypus and echidna tissues. We find that highly conserved microRNA species display their signature tissue-specific expression patterns. In addition, we find a large rapidly evolving cluster of microRNAs on platypus chromosome X1, which is unique to monotremes. Platypus and echidna testes contain a robust Piwi-interacting (piRNA) system, which appears to be participating in ongoing transposon defense.
Collapse
|
20
|
A single Hox locus in Drosophila produces functional microRNAs from opposite DNA strands. Genes Dev 2008; 22:8-13. [PMID: 18172160 DOI: 10.1101/gad.1613108] [Citation(s) in RCA: 195] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
MicroRNAs (miRNAs) are approximately 22-nucleotide RNAs that are processed from characteristic precursor hairpins and pair to sites in messages of protein-coding genes to direct post-transcriptional repression. Here, we report that the miRNA iab-4 locus in the Drosophila Hox cluster is transcribed convergently from both DNA strands, giving rise to two distinct functional miRNAs. Both sense and antisense miRNA products target neighboring Hox genes via highly conserved sites, leading to homeotic transformations when ectopically expressed. We also report sense/antisense miRNAs in mouse and find antisense transcripts close to many miRNAs in both flies and mammals, suggesting that additional sense/antisense pairs exist.
Collapse
|
21
|
Reliable prediction of regulator targets using 12 Drosophila genomes. Genes Dev 2007; 17:1919-31. [PMID: 17989251 PMCID: PMC2099599 DOI: 10.1101/gr.7090407] [Citation(s) in RCA: 139] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2007] [Accepted: 10/10/2007] [Indexed: 12/24/2022]
Abstract
Gene expression is regulated pre- and post-transcriptionally via cis-regulatory DNA and RNA motifs. Identification of individual functional instances of such motifs in genome sequences is a major goal for inferring regulatory networks yet has been hampered due to the motifs' short lengths that lead to many chance matches and poor signal-to-noise ratios. In this paper, we develop a general methodology for the comparative identification of functional motif instances across many related species, using a phylogenetic framework that accounts for the evolutionary relationships between species, allows for motif movements, and is robust against missing data due to artifacts in sequencing, assembly, or alignment. We also provide a robust statistical framework for evaluating motif confidence, which enables us to translate evolutionary conservation into a confidence measure for each motif instance, correcting for varying motif length, composition, and background conservation of the target regions. We predict targets of fly transcription factors and miRNAs in alignments of 12 recently sequenced Drosophila species. When compared to extensive genome-wide experimental data, predicted targets are of high quality, matching and surpassing ChIP-chip microarrays and recovering miRNA targets with high sensitivity. The resulting regulatory network suggests significant redundancy between pre- and post-transcriptional regulation of gene expression.
Collapse
|
22
|
Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res 2007; 17:1865-79. [PMID: 17989255 DOI: 10.1101/gr.6593807] [Citation(s) in RCA: 173] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
MicroRNAs (miRNAs) are short regulatory RNAs that inhibit target genes by complementary binding in 3' untranslated regions (3' UTRs). They are one of the most abundant classes of regulators, targeting a large fraction of all genes, making their comprehensive study a requirement for understanding regulation and development. Here we use 12 Drosophila genomes to define structural and evolutionary signatures of miRNA hairpins, which we use for their de novo discovery. We predict >41 novel miRNA genes, which encompass many unique families, and 28 of which are validated experimentally. We also define signals for the precise start position of mature miRNAs, which suggest corrections of previously known miRNAs, often leading to drastic changes in their predicted target spectrum. We show that miRNA discovery power scales with the number and divergence of species compared, suggesting that such approaches can be successful in human as dozens of mammalian genomes become available. Interestingly, for some miRNAs sense and anti-sense hairpins score highly and mature miRNAs from both strands can indeed be found in vivo. Similarly, miRNAs with weak 5' end predictions show increased in vivo processing of multiple alternate 5' ends and have fewer predicted targets. Lastly, we show that several miRNA star sequences score highly and are likely functional. For mir-10 in particular, both arms show abundant processing, and both show highly conserved target sites in Hox genes, suggesting a possible cooperation of the two arms, and their role as a master Hox regulator.
Collapse
|
23
|
Abstract
MOTIVATION Methods that focus on secondary structures, such as Position Specific Scoring Matrices and Hidden Markov Models, have proved useful for assigning proteins to families. However, for assigning proteins to an attribute class within a family these methods may introduce more free parameters than are needed. There are fewer members and there is less variability among sequences within a family. We describe a method for organizing proteins in a family that exhibits up to an order of magnitude reduction in the number of parameters. The basis is the log odds ratio commonly used to measure similarity. We adapt this to characterize the sequence dissimilarities that give rise to attribute differentiation. This leads to the definition of Class Attribute Substitution Matrices (CLASSUM), a dual of the BLOSUM. RESULTS The method was applied to classify sequences hierarchically in the lambda and kappa subgroups of the immunoglobulin superfamily. Positions conferring class were identified based on the degree of amino acid variability at a position. The CLASSUM computed for these positions classified better than 90% of test data correctly compared with 35-50% for BLOSUM-62. The expected value for a random matrix is 14%. The results suggest that family-specific data-derived substitution matrices can improve the resolution of automated methods that use generic substitution matrices for searching for and classifying proteins.
Collapse
|