1
|
Genomic context sensitizes regulatory elements to genetic disruption. Mol Cell 2024; 84:1842-1854.e7. [PMID: 38759624 PMCID: PMC11104518 DOI: 10.1016/j.molcel.2024.04.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 03/11/2024] [Accepted: 04/18/2024] [Indexed: 05/19/2024]
Abstract
Genomic context critically modulates regulatory function but is difficult to manipulate systematically. The murine insulin-like growth factor 2 (Igf2)/H19 locus is a paradigmatic model of enhancer selectivity, whereby CTCF occupancy at an imprinting control region directs downstream enhancers to activate either H19 or Igf2. We used synthetic regulatory genomics to repeatedly replace the native locus with 157-kb payloads, and we systematically dissected its architecture. Enhancer deletion and ectopic delivery revealed previously uncharacterized long-range regulatory dependencies at the native locus. Exchanging the H19 enhancer cluster with the Sox2 locus control region (LCR) showed that the H19 enhancers relied on their native surroundings while the Sox2 LCR functioned autonomously. Analysis of regulatory DNA actuation across cell types revealed that these enhancer clusters typify broader classes of context sensitivity genome wide. These results show that unexpected dependencies influence even well-studied loci, and our approach permits large-scale manipulation of complete loci to investigate the relationship between regulatory architecture and function.
Collapse
|
2
|
Abstract
Pervasive transcriptional activity is observed across diverse species. The genomes of extant organisms have undergone billions of years of evolution, making it unclear whether these genomic activities represent effects of selection or 'noise'1-4. Characterizing default genome states could help understand whether pervasive transcriptional activity has biological meaning. Here we addressed this question by introducing a synthetic 101-kb locus into the genomes of Saccharomyces cerevisiae and Mus musculus and characterizing genomic activity. The locus was designed by reversing but not complementing human HPRT1, including its flanking regions, thus retaining basic features of the natural sequence but ablating evolved coding or regulatory information. We observed widespread activity of both reversed and native HPRT1 loci in yeast, despite the lack of evolved yeast promoters. By contrast, the reversed locus displayed no activity at all in mouse embryonic stem cells, and instead exhibited repressive chromatin signatures. The repressive signature was alleviated in a locus variant lacking CpG dinucleotides; nevertheless, this variant was also transcriptionally inactive. These results show that synthetic genomic sequences that lack coding information are active in yeast, but inactive in mouse embryonic stem cells, consistent with a major difference in 'default genomic states' between these two divergent eukaryotic cell types, with implications for understanding pervasive transcription, horizontal transfer of genetic information and the birth of new genes.
Collapse
|
3
|
Genomic context sensitizes regulatory elements to genetic disruption. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.02.547201. [PMID: 37781588 PMCID: PMC10541140 DOI: 10.1101/2023.07.02.547201] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
Enhancer function is frequently investigated piecemeal using truncated reporter assays or single deletion analysis. Thus it remains unclear to what extent enhancer function at native loci relies on surrounding genomic context. Using the Big-IN technology for targeted integration of large DNAs, we analyzed the regulatory architecture of the murine Igf2/H19 locus, a paradigmatic model of enhancer selectivity. We assembled payloads containing a 157-kb functional Igf2/H19 locus and engineered mutations to genetically direct CTCF occupancy at the imprinting control region (ICR) that switches the target gene of the H19 enhancer cluster. Contrasting activity of payloads delivered at the endogenous Igf2/H19 locus or ectopically at Hprt revealed that the Igf2/H19 locus includes additional, previously unknown long-range regulatory elements. Exchanging components of the Igf2/H19 locus with the well-studied Sox2 locus showed that the H19 enhancer cluster functioned poorly out of context, and required its native surroundings to activate Sox2 expression. Conversely, the Sox2 locus control region (LCR) could activate both Igf2 and H19 outside its native context, but its activity was only partially modulated by CTCF occupancy at the ICR. Analysis of regulatory DNA actuation across different cell types revealed that, while the H19 enhancers are tightly coordinated within their native locus, the Sox2 LCR acts more independently. We show that these enhancer clusters typify broader classes of loci genome-wide. Our results show that unexpected dependencies may influence even the most studied functional elements, and our synthetic regulatory genomics approach permits large-scale manipulation of complete loci to investigate the relationship between locus architecture and function.
Collapse
|
4
|
Abstract
The loss of the tail is among the most notable anatomical changes to have occurred along the evolutionary lineage leading to humans and to the 'anthropomorphous apes'1-3, with a proposed role in contributing to human bipedalism4-6. Yet, the genetic mechanism that facilitated tail-loss evolution in hominoids remains unknown. Here we present evidence that an individual insertion of an Alu element in the genome of the hominoid ancestor may have contributed to tail-loss evolution. We demonstrate that this Alu element-inserted into an intron of the TBXT gene7-9-pairs with a neighbouring ancestral Alu element encoded in the reverse genomic orientation and leads to a hominoid-specific alternative splicing event. To study the effect of this splicing event, we generated multiple mouse models that express both full-length and exon-skipped isoforms of Tbxt, mimicking the expression pattern of its hominoid orthologue TBXT. Mice expressing both Tbxt isoforms exhibit a complete absence of the tail or a shortened tail depending on the relative abundance of Tbxt isoforms expressed at the embryonic tail bud. These results support the notion that the exon-skipped transcript is sufficient to induce a tail-loss phenotype. Moreover, mice expressing the exon-skipped Tbxt isoform develop neural tube defects, a condition that affects approximately 1 in 1,000 neonates in humans10. Thus, tail-loss evolution may have been associated with an adaptive cost of the potential for neural tube defects, which continue to affect human health today.
Collapse
|
5
|
Mouse genome rewriting and tailoring of three important disease loci. Nature 2023; 623:423-431. [PMID: 37914927 PMCID: PMC10632133 DOI: 10.1038/s41586-023-06675-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 09/25/2023] [Indexed: 11/03/2023]
Abstract
Genetically engineered mouse models (GEMMs) help us to understand human pathologies and develop new therapies, yet faithfully recapitulating human diseases in mice is challenging. Advances in genomics have highlighted the importance of non-coding regulatory genome sequences, which control spatiotemporal gene expression patterns and splicing in many human diseases1,2. Including regulatory extensive genomic regions, which requires large-scale genome engineering, should enhance the quality of disease modelling. Existing methods set limits on the size and efficiency of DNA delivery, hampering the routine creation of highly informative models that we call genomically rewritten and tailored GEMMs (GREAT-GEMMs). Here we describe 'mammalian switching antibiotic resistance markers progressively for integration' (mSwAP-In), a method for efficient genome rewriting in mouse embryonic stem cells. We demonstrate the use of mSwAP-In for iterative genome rewriting of up to 115 kb of a tailored Trp53 locus, as well as for humanization of mice using 116 kb and 180 kb human ACE2 loci. The ACE2 model recapitulated human ACE2 expression patterns and splicing, and notably, presented milder symptoms when challenged with SARS-CoV-2 compared with the existing K18-hACE2 model, thus representing a more human-like model of infection. Finally, we demonstrated serial genome writing by humanizing mouse Tmprss2 biallelically in the ACE2 GREAT-GEMM, highlighting the versatility of mSwAP-In in genome writing.
Collapse
|
6
|
Synthetic regulatory genomics uncovers enhancer context dependence at the Sox2 locus. Mol Cell 2023; 83:1140-1152.e7. [PMID: 36931273 PMCID: PMC10081970 DOI: 10.1016/j.molcel.2023.02.027] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 01/20/2023] [Accepted: 02/23/2023] [Indexed: 03/18/2023]
Abstract
Sox2 expression in mouse embryonic stem cells (mESCs) depends on a distal cluster of DNase I hypersensitive sites (DHSs), but their individual contributions and degree of interdependence remain a mystery. We analyzed the endogenous Sox2 locus using Big-IN to scarlessly integrate large DNA payloads incorporating deletions, rearrangements, and inversions affecting single or multiple DHSs, as well as surgical alterations to transcription factor (TF) recognition sequences. Multiple mESC clones were derived for each payload, sequence-verified, and analyzed for Sox2 expression. We found that two DHSs comprising a handful of key TF recognition sequences were each sufficient for long-range activation of Sox2 expression. By contrast, three nearby DHSs were entirely context dependent, showing no activity alone but dramatically augmenting the activity of the autonomous DHSs. Our results highlight the role of context in modulating genomic regulatory element function, and our synthetic regulatory genomics approach provides a roadmap for the dissection of other genomic loci.
Collapse
|
7
|
The transcription factor DDIT3 is a potential driver of dyserythropoiesis in myelodysplastic syndromes. Nat Commun 2022; 13:7619. [PMID: 36494342 PMCID: PMC9734135 DOI: 10.1038/s41467-022-35192-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 11/21/2022] [Indexed: 12/13/2022] Open
Abstract
Myelodysplastic syndromes (MDS) are hematopoietic stem cell (HSC) malignancies characterized by ineffective hematopoiesis, with increased incidence in older individuals. Here we analyze the transcriptome of human HSCs purified from young and older healthy adults, as well as MDS patients, identifying transcriptional alterations following different patterns of expression. While aging-associated lesions seem to predispose HSCs to myeloid transformation, disease-specific alterations may trigger MDS development. Among MDS-specific lesions, we detect the upregulation of the transcription factor DNA Damage Inducible Transcript 3 (DDIT3). Overexpression of DDIT3 in human healthy HSCs induces an MDS-like transcriptional state, and dyserythropoiesis, an effect associated with a failure in the activation of transcriptional programs required for normal erythroid differentiation. Moreover, DDIT3 knockdown in CD34+ cells from MDS patients with anemia is able to restore erythropoiesis. These results identify DDIT3 as a driver of dyserythropoiesis, and a potential therapeutic target to restore the inefficient erythroid differentiation characterizing MDS patients.
Collapse
|
8
|
Abstract
Precise Hox gene expression is crucial for embryonic patterning. Intra-Hox transcription factor binding and distal enhancer elements have emerged as the major regulatory modules controlling Hox gene expression. However, quantifying their relative contributions has remained elusive. Here, we introduce "synthetic regulatory reconstitution," a conceptual framework for studying gene regulation, and apply it to the HoxA cluster. We synthesized and delivered variant rat HoxA clusters (130 to 170 kilobases) to an ectopic location in the mouse genome. We found that a minimal HoxA cluster recapitulated correct patterns of chromatin remodeling and transcription in response to patterning signals, whereas the addition of distal enhancers was needed for full transcriptional output. Synthetic regulatory reconstitution could provide a generalizable strategy for deciphering the regulatory logic of gene expression in complex genomes.
Collapse
|
9
|
A conditional counterselectable Piga knockout in mouse embryonic stem cells for advanced genome writing applications. iScience 2022; 25:104438. [PMID: 35692632 PMCID: PMC9184564 DOI: 10.1016/j.isci.2022.104438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 03/18/2022] [Accepted: 05/17/2022] [Indexed: 11/16/2022] Open
Abstract
Overwriting counterselectable markers is an efficient strategy for removing wild-type DNA or replacing it with payload DNA of interest. Currently, one bottleneck of efficient genome engineering in mammals is the shortage of counterselectable (negative selection) markers that work robustly without affecting organismal developmental potential. Here, we report a conditional Piga knockout strategy that enables efficient proaerolysin-based counterselection in mouse embryonic stem cells. The conditional Piga knockout cells show similar proaerolysin resistance as full (non-conditional) Piga deletion cells, which enables the use of a PIGA transgene as a counterselectable marker for genome engineering purposes. Native Piga function is readily restored in conditional Piga knockout cells to facilitate subsequent mouse development. We also demonstrate the generality of our strategy by engineering a conditional knockout of endogenous Hprt. Taken together, our work provides a new tool for advanced mouse genome writing and mouse model establishment.
Collapse
|
10
|
Abstract
The specificity of interactions between genomic regulatory elements and potential target genes is influenced by the binding of insulator proteins such as CTCF, which can act as potent enhancer blockers when interposed between an enhancer and a promoter in a reporter assay. But not all CTCF sites genome-wide function as insulator elements, depending on cellular and genomic context. To dissect the influence of genomic context on enhancer blocker activity, we integrated reporter constructs with promoter-only, promoter and enhancer, and enhancer blocker configurations at hundreds of thousands of genomic sites using the Sleeping Beauty transposase. Deconvolution of reporter activity by genomic position reveals distinct expression patterns subject to genomic context, including a compartment of enhancer blocker reporter integrations with robust expression. The high density of integration sites permits quantitative delineation of characteristic genomic context sensitivity profiles and their decomposition into sensitivity to both local and distant DNase I hypersensitive sites. Furthermore, using a single-cell expression approach to test the effect of integrated reporters for differential expression of nearby endogenous genes reveals that CTCF insulator elements do not completely abrogate reporter effects on endogenous gene expression. Collectively, our results lend new insight into genomic regulatory compartmentalization and its influence on the determinants of promoter–enhancer specificity.
Collapse
|
11
|
SARS-CoV-2 genomic characterization and clinical manifestation of the COVID-19 outbreak in Uruguay. Emerg Microbes Infect 2021; 10:51-65. [PMID: 33306459 PMCID: PMC7832039 DOI: 10.1080/22221751.2020.1863747] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 11/23/2020] [Accepted: 12/09/2020] [Indexed: 01/15/2023]
Abstract
COVID-19 is a respiratory illness caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and declared by the World Health Organization a global public health emergency. Among the severe outbreaks across South America, Uruguay has become known for curtailing SARS-CoV-2 exceptionally well. To understand the SARS-CoV-2 introductions, local transmissions, and associations with genomic and clinical parameters in Uruguay, we sequenced the viral genomes of 44 outpatients and inpatients in a private healthcare system in its capital, Montevideo, from March to May 2020. We performed a phylogeographic analysis using sequences from our cohort and other studies that indicate a minimum of 23 independent introductions into Uruguay, resulting in five major transmission clusters. Our data suggest that most introductions resulting in chains of transmission originate from other South American countries, with the earliest seeding of the virus in late February 2020, weeks before the borders were closed to all non-citizens and a partial lockdown implemented. Genetic analyses suggest a dominance of S and G clades (G, GH, GR) that make up >90% of the viral strains in our study. In our cohort, lethal outcome of SARS-CoV-2 infection significantly correlated with arterial hypertension, kidney failure, and ICU admission (FDR < 0.01), but not with any mutation in a structural or non-structural protein, such as the spike D614G mutation. Our study contributes genetic, phylodynamic, and clinical correlation data about the exceptionally well-curbed SARS-CoV-2 outbreak in Uruguay, which furthers the understanding of disease patterns and regional aspects of the pandemic in Latin America.
Collapse
|
12
|
Dispersal dynamics of SARS-CoV-2 lineages during the first epidemic wave in New York City. PLoS Pathog 2021; 17:e1009571. [PMID: 34015049 PMCID: PMC8136714 DOI: 10.1371/journal.ppat.1009571] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 04/19/2021] [Indexed: 12/31/2022] Open
Abstract
During the first phase of the COVID-19 epidemic, New York City rapidly became the epicenter of the pandemic in the United States. While molecular phylogenetic analyses have previously highlighted multiple introductions and a period of cryptic community transmission within New York City, little is known about the circulation of SARS-CoV-2 within and among its boroughs. We here perform phylogeographic investigations to gain insights into the circulation of viral lineages during the first months of the New York City outbreak. Our analyses describe the dispersal dynamics of viral lineages at the state and city levels, illustrating that peripheral samples likely correspond to distinct dispersal events originating from the main metropolitan city areas. In line with the high prevalence recorded in this area, our results highlight the relatively important role of the borough of Queens as a transmission hub associated with higher local circulation and dispersal of viral lineages toward the surrounding boroughs.
Collapse
|
13
|
De novo assembly and delivery to mouse cells of a 101 kb functional human gene. Genetics 2021; 218:6179110. [PMID: 33742653 DOI: 10.1093/genetics/iyab038] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 02/10/2021] [Indexed: 11/14/2022] Open
Abstract
Design and large-scale synthesis of DNA has been applied to the functional study of viral and microbial genomes. New and expanded technology development is required to unlock the transformative potential of such bottom-up approaches to the study of larger mammalian genomes. Two major challenges include assembling and delivering long DNA sequences. Here, we describe a workflow for de novo DNA assembly and delivery that enables functional evaluation of mammalian genes on the length scale of 100 kilobase pairs (kb). The DNA assembly step is supported by an integrated robotic workcell. We demonstrate assembly of the 101 kb human HPRT1 gene in yeast from 3 kb building blocks, precision delivery of the resulting construct to mouse embryonic stem cells, and subsequent expression of the human protein from its full-length human gene in mouse cells. This workflow provides a framework for mammalian genome writing. We envision utility in producing designer variants of human genes linked to disease and their delivery and functional analysis in cell culture or animal models.
Collapse
|
14
|
Sequencing identifies multiple early introductions of SARS-CoV-2 to the New York City region. Genome Res 2020; 30:1781-1788. [PMID: 33093069 PMCID: PMC7706732 DOI: 10.1101/gr.266676.120] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 10/20/2020] [Indexed: 11/30/2022]
Abstract
Effective public response to a pandemic relies upon accurate measurement of the extent and dynamics of an outbreak. Viral genome sequencing has emerged as a powerful approach to link seemingly unrelated cases, and large-scale sequencing surveillance can inform on critical epidemiological parameters. Here, we report the analysis of 864 SARS-CoV-2 sequences from cases in the New York City metropolitan area during the COVID-19 outbreak in spring 2020. The majority of cases had no recent travel history or known exposure, and genetically linked cases were spread throughout the region. Comparison to global viral sequences showed that early transmission was most linked to cases from Europe. Our data are consistent with numerous seeds from multiple sources and a prolonged period of unrecognized community spreading. This work highlights the complementary role of genomic surveillance in addition to traditional epidemiological indicators.
Collapse
|
15
|
Sequencing identifies multiple early introductions of SARS-CoV-2 to the New York City Region. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2020:2020.04.15.20064931. [PMID: 32511587 PMCID: PMC7276014 DOI: 10.1101/2020.04.15.20064931] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Effective public response to a pandemic relies upon accurate measurement of the extent and dynamics of an outbreak. Viral genome sequencing has emerged as a powerful approach to link seemingly unrelated cases, and large-scale sequencing surveillance can inform on critical epidemiological parameters. Here, we report the analysis of 864 SARS-CoV-2 sequences from cases in the New York City metropolitan area during the COVID-19 outbreak in Spring 2020. The majority of cases had no recent travel history or known exposure, and genetically linked cases were spread throughout the region. Comparison to global viral sequences showed that early transmission was most linked to cases from Europe. Our data are consistent with numerous seeds from multiple sources and a prolonged period of unrecognized community spreading. This work highlights the complementary role of genomic surveillance in addition to traditional epidemiological indicators.
Collapse
|
16
|
Abstract
Noncoding DNA sequences, which play various roles in gene expression and regulation, are under evolutionary pressure. Gene regulation requires specific protein–DNA binding events, and our previous studies showed that both DNA sequence and shape readout are employed by transcription factors (TFs) to achieve DNA binding specificity. By investigating the shape-disrupting properties of single nucleotide polymorphisms (SNPs) in human regulatory regions, we established a link between disruptive local DNA shape changes and loss of specific TF binding. Furthermore, we described cases where disease-associated SNPs may alter TF binding through DNA shape changes. This link led us to hypothesize that local DNA shape within and around TF binding sites is under selection pressure. To verify this hypothesis, we analyzed SNP data derived from 216 natural strains of Drosophila melanogaster. Comparing SNPs located in functional and nonfunctional regions within experimentally validated cis-regulatory modules (CRMs) from D. melanogaster that are active in the blastoderm stage of development, we found that SNPs within functional regions tended to cause smaller DNA shape variations. Furthermore, SNPs with higher minor allele frequency were more likely to result in smaller DNA shape variations. The same analysis based on a large number of SNPs in putative CRMs of the D. melanogaster genome derived from DNase I accessibility data confirmed these observations. Taken together, our results indicate that common SNPs in functional regions tend to maintain DNA shape, whereas shape-disrupting SNPs are more likely to be eliminated through purifying selection.
Collapse
|
17
|
Abstract
In the version of this article initially published, in Fig. 5a, the data in the right column of 'DAAM2 gRNA1' were incorrectly plotted as circles indicating 'untreated' rather than as squares indicating 'treated'. The error has been corrected in the HTML and PDF versions of the article.
Collapse
|
18
|
Abstract
Osteoporosis is a common aging-related disease diagnosed primarily using bone mineral density (BMD). We assessed genetic determinants of BMD as estimated by heel quantitative ultrasound in 426,824 individuals, identifying 518 genome-wide significant loci (301 novel), explaining 20% of its variance. We identified 13 bone fracture loci, all associated with estimated BMD (eBMD), in ~1.2 million individuals. We then identified target genes enriched for genes known to influence bone density and strength (maximum odds ratio (OR) = 58, P = 1 × 10-75) from cell-specific features, including chromatin conformation and accessible chromatin sites. We next performed rapid-throughput skeletal phenotyping of 126 knockout mice with disruptions in predicted target genes and found an increased abnormal skeletal phenotype frequency compared to 526 unselected lines (P < 0.0001). In-depth analysis of one gene, DAAM2, showed a disproportionate decrease in bone strength relative to mineralization. This genetic atlas provides evidence linking associated SNPs to causal genes, offers new insight into osteoporosis pathophysiology, and highlights opportunities for drug development.
Collapse
|
19
|
Big DNA as a tool to dissect an age-related macular degeneration-associated haplotype. PRECISION CLINICAL MEDICINE 2019; 2:1-7. [PMID: 30944767 PMCID: PMC6432742 DOI: 10.1093/pcmedi/pby019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 12/24/2018] [Indexed: 11/13/2022] Open
Abstract
Age-related Macular Degeneration (AMD) is a leading cause of blindness in the developed world, especially in aging populations, and is therefore an important target for new therapeutic development. Recently, there have been several studies demonstrating strong associations between AMD and sites of heritable genetic variation at multiple loci, including a highly significant association at 10q26. The 10q26 risk region contains two genes, HTRA1 and ARMS2, both of which have been separately implicated as causative for the disease, as well as dozens of sites of non-coding variation. To date, no studies have successfully pinpointed which of these variant sites are functional in AMD, nor definitively identified which genes in the region are targets of such regulatory variation. In order to efficiently decipher which sites are functional in AMD phenotypes, we describe a general framework for combinatorial assembly of large ‘synthetic haplotypes’ along with delivery to relevant disease cell types for downstream functional analysis. We demonstrate the successful and highly efficient assembly of a first-draft 119kb wild-type ‘assemblon’ covering the HTRA1/ARMS2 risk region. We further propose the parallelized assembly of a library of combinatorial variant synthetic haplotypes covering the region, delivery and analysis of which will identify functional sites and their effects, leading to an improved understanding of AMD development. We anticipate that the methodology proposed here is highly generalizable towards the difficult problem of identifying truly functional variants from those discovered via GWAS or other genetic association studies.
Collapse
|
20
|
Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells. Cell 2017; 167:1398-1414.e24. [PMID: 27863251 PMCID: PMC5119954 DOI: 10.1016/j.cell.2016.10.026] [Citation(s) in RCA: 389] [Impact Index Per Article: 55.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Revised: 08/19/2016] [Accepted: 10/14/2016] [Indexed: 12/20/2022]
Abstract
Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14+ monocytes, CD16+ neutrophils, and naive CD4+ T cells) from up to 197 individuals. We assess, quantitatively, the relative contribution of cis-genetic and epigenetic factors to transcription and evaluate their impact as potential sources of confounding in epigenome-wide association studies. Further, we characterize highly coordinated genetic effects on gene expression, methylation, and histone variation through quantitative trait locus (QTL) mapping and allele-specific (AS) analyses. Finally, we demonstrate colocalization of molecular trait QTLs at 345 unique immune disease loci. This expansive, high-resolution atlas of multi-omics changes yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk. Genome, transcriptome, and epigenome reference panel in three human immune cell types Identified 4,418 genes associated with epigenetic changes independent of genetics Described genome-epigenome coordination defining cell-type-specific regulatory events Functionally mapped disease mechanisms at 345 unique autoimmune disease loci
Collapse
|
21
|
Abstract
We need technology and an ethical framework for genome-scale engineering
Collapse
|
22
|
Erratum: Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nat Genet 2016; 48:101. [DOI: 10.1038/ng0116-101c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
23
|
Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture. Nature 2015; 526:112-117. [PMID: 26367794 PMCID: PMC4755714 DOI: 10.1038/nature14878 10.1016/j.ajhg.2017.12.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Accepted: 06/30/2015] [Indexed: 04/02/2024]
Abstract
The extent to which low-frequency (minor allele frequency (MAF) between 1-5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is mainly unknown. Bone mineral density (BMD) is highly heritable, a major predictor of osteoporotic fractures, and has been previously associated with common genetic variants, as well as rare, population-specific, coding variants. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n = 2,882 from UK10K (ref. 10); a population-based genome sequencing consortium), whole-exome sequencing (n = 3,549), deep imputation of genotyped samples using a combined UK10K/1000 Genomes reference panel (n = 26,534), and de novo replication genotyping (n = 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size fourfold larger than the mean of previously reported common variants for lumbar spine BMD (rs11692564(T), MAF = 1.6%, replication effect size = +0.20 s.d., Pmeta = 2 × 10(-14)), which was also associated with a decreased risk of fracture (odds ratio = 0.85; P = 2 × 10(-11); ncases = 98,742 and ncontrols = 409,511). Using an En1(cre/flox) mouse model, we observed that conditional loss of En1 results in low bone mass, probably as a consequence of high bone turnover. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817(T), MAF = 1.2%, replication effect size = +0.41 s.d., Pmeta = 1 × 10(-11)). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture, thereby providing rationale for whole-genome sequencing and improved imputation reference panels to study the genetic architecture of complex traits and disease in the general population.
Collapse
|
24
|
Role of DNA Methylation in Modulating Transcription Factor Occupancy. Cell Rep 2015; 12:1184-95. [PMID: 26257180 DOI: 10.1016/j.celrep.2015.07.024] [Citation(s) in RCA: 187] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Revised: 06/14/2015] [Accepted: 07/10/2015] [Indexed: 02/07/2023] Open
Abstract
Although DNA methylation is commonly invoked as a mechanism for transcriptional repression, the extent to which it actively silences transcription factor (TF) occupancy sites in vivo is unknown. To study the role of DNA methylation in the active modulation of TF binding, we quantified the effect of DNA methylation depletion on the genomic occupancy patterns of CTCF, an abundant TF with known methylation sensitivity that is capable of autonomous binding to its target sites in chromatin. Here, we show that the vast majority (>98.5%) of the tens of thousands of unoccupied, methylated CTCF recognition sequences remain unbound upon abrogation of DNA methylation. The small fraction of sites that show methylation-dependent binding in vivo are in turn characterized by highly variable CTCF occupancy across cell types. Our results suggest that DNA methylation is not a primary groundskeeper of genomic TF landscapes, but rather a specialized mechanism for stabilizing intrinsically labile sites.
Collapse
|
25
|
Abstract
Three recent studies measure individual variation in regulatory DNA accessibility. What do they tell us about the prospects of assessing variation in single cells and across populations?
Collapse
|
26
|
Genomic discovery of potent chromatin insulators for human gene therapy. Nat Biotechnol 2015; 33:198-203. [PMID: 25580597 DOI: 10.1038/nbt.3062] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 10/09/2014] [Indexed: 12/29/2022]
Abstract
Insertional mutagenesis and genotoxicity, which usually manifest as hematopoietic malignancy, represent major barriers to realizing the promise of gene therapy. Although insulator sequences that block transcriptional enhancers could mitigate or eliminate these risks, so far no human insulators with high functional potency have been identified. Here we describe a genomic approach for the identification of compact sequence elements that function as insulators. These elements are highly occupied by the insulator protein CTCF, are DNase I hypersensitive and represent only a small minority of the CTCF recognition sequences in the human genome. We show that the elements identified acted as potent enhancer blockers and substantially decreased the risk of tumor formation in a cancer-prone animal model. The elements are small, can be efficiently accommodated by viral vectors and have no detrimental effects on viral titers. The insulators we describe here are expected to increase the safety of gene therapy for genetic diseases.
Collapse
|
27
|
DNA methylation alone does not cause most cell-type selective transcription factor binding. Epigenetics Chromatin 2013. [PMCID: PMC3600734 DOI: 10.1186/1756-8935-6-s1-p103] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
28
|
Abstract
The characteristics and evolutionary forces acting on regulatory variation in humans remains elusive because of the difficulty in defining functionally important noncoding DNA. Here, we combine genome-scale maps of regulatory DNA marked by DNase I hypersensitive sites (DHSs) from 138 cell and tissue types with whole-genome sequences of 53 geographically diverse individuals in order to better delimit the patterns of regulatory variation in humans. We estimate that individuals likely harbor many more functionally important variants in regulatory DNA compared with protein-coding regions, although they are likely to have, on average, smaller effect sizes. Moreover, we demonstrate that there is significant heterogeneity in the level of functional constraint in regulatory DNA among different cell types. We also find marked variability in functional constraint among transcription factor motifs in regulatory DNA, with sequence motifs for major developmental regulators, such as HOX proteins, exhibiting levels of constraint comparable to protein-coding regions. Finally, we perform a genome-wide scan of recent positive selection and identify hundreds of novel substrates of adaptive regulatory evolution that are enriched for biologically interesting pathways such as melanogenesis and adipocytokine signaling. These data and results provide new insights into patterns of regulatory variation in individuals and populations and demonstrate that a large proportion of functionally important variation lies beyond the exome.
Collapse
|
29
|
Abstract
CTCF is a ubiquitously expressed regulator of fundamental genomic processes including transcription, intra- and interchromosomal interactions, and chromatin structure. Because of its critical role in genome function, CTCF binding patterns have long been assumed to be largely invariant across different cellular environments. Here we analyze genome-wide occupancy patterns of CTCF by ChIP-seq in 19 diverse human cell types, including normal primary cells and immortal lines. We observed highly reproducible yet surprisingly plastic genomic binding landscapes, indicative of strong cell-selective regulation of CTCF occupancy. Comparison with massively parallel bisulfite sequencing data indicates that 41% of variable CTCF binding is linked to differential DNA methylation, concentrated at two critical positions within the CTCF recognition sequence. Unexpectedly, CTCF binding patterns were markedly different in normal versus immortal cells, with the latter showing widespread disruption of CTCF binding associated with increased methylation. Strikingly, this disruption is accompanied by up-regulation of CTCF expression, with the result that both normal and immortal cells maintain the same average number of CTCF occupancy sites genome-wide. These results reveal a tight linkage between DNA methylation and the global occupancy patterns of a major sequence-specific regulatory factor.
Collapse
|
30
|
Abstract
DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ∼2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. We connect ∼580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase I sensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation.
Collapse
|
31
|
An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 2012; 489:83-90. [PMID: 22955618 PMCID: PMC3736582 DOI: 10.1038/nature11212] [Citation(s) in RCA: 566] [Impact Index Per Article: 47.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2011] [Accepted: 05/10/2012] [Indexed: 01/04/2023]
Abstract
Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency.
Collapse
|
32
|
Abstract
Genome-wide association studies have identified many noncoding variants associated with common diseases and traits. We show that these variants are concentrated in regulatory DNA marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). Eighty-eight percent of such DHSs are active during fetal development and are enriched in variants associated with gestational exposure-related phenotypes. We identified distant gene targets for hundreds of variant-containing DHSs that may explain phenotype associations. Disease-associated variants systematically perturb transcription factor recognition sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrated tissue-selective enrichment of more weakly disease-associated variants within DHSs and the de novo identification of pathogenic cell types for Crohn's disease, multiple sclerosis, and an electrocardiogram trait, without prior knowledge of physiological mechanisms. Our results suggest pervasive involvement of regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders.
Collapse
|
33
|
Abstract
UNLABELLED The large and growing number of genome-wide datasets highlights the need for high-performance feature analysis and data comparison methods, in addition to efficient data storage and retrieval techniques. We introduce BEDOPS, a software suite for common genomic analysis tasks which offers improved flexibility, scalability and execution time characteristics over previously published packages. The suite includes a utility to compress large inputs into a lossless format that can provide greater space savings and faster data extractions than alternatives. AVAILABILITY http://code.google.com/p/bedops/ includes binaries, source and documentation.
Collapse
|