1
|
Williams RM. Leveraging chicken embryos for studying human enhancers. Dev Biol 2025; 524:123-131. [PMID: 40368318 DOI: 10.1016/j.ydbio.2025.05.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 04/30/2025] [Accepted: 05/12/2025] [Indexed: 05/16/2025]
Abstract
The dynamic activity of complex gene regulatory networks stands at the core of all cellular functions that define cell identity and behaviour. Gene regulatory networks comprise transcriptional enhancers, acted upon by cell-specific transcription factors to control gene expression in a spatial and temporal specific manner. Enhancers are found in the non-coding genome; pathogenic variants can disrupt enhancer activity and lead to disease. Correlating non-coding variants with aberrant enhancer activity remains a significant challenge. Due to their clinical significance, there is a longstanding interest in understanding enhancer function during early embryogenesis. With the onset of the omics era, it is now feasible to identify putative tissue-specific enhancers from epigenome data. However, such predictions in vivo require validation. The early stages of chick embryogenesis closely parallel those of human, offering an accessible in vivo model in which to assess the activity of putative human enhancer sequences. This review explores the unique advantages and recent advancements in employing chicken embryos to elucidate the activity of human transcriptional enhancers and the potential implications of these findings in human disease.
Collapse
Affiliation(s)
- Ruth M Williams
- University of Manchester, Faculty of Biology, Medicine and Health, Michael Smith Building, Oxford Road, Manchester, United Kingdom.
| |
Collapse
|
2
|
Dincer TU, Ernst J. ChromActivity: integrative epigenomic and functional characterization assay based annotation of regulatory activity across diverse human cell types. Genome Biol 2025; 26:123. [PMID: 40346707 PMCID: PMC12063466 DOI: 10.1186/s13059-025-03579-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 04/15/2025] [Indexed: 05/11/2025] Open
Abstract
We introduce ChromActivity, a computational framework for predicting and annotating regulatory activity across the genome through integration of multiple epigenomic maps and various functional characterization datasets. ChromActivity generates genomewide predictions of regulatory activity associated with each functional characterization dataset across many cell types based on available epigenomic data. It then for each cell type produces ChromScoreHMM genome annotations based on the combinatorial and spatial patterns within these predictions and ChromScore tracks of overall predicted regulatory activity. ChromActivity provides a resource for analyzing and interpreting the human regulatory genome across diverse cell types.
Collapse
Affiliation(s)
- Tevfik Umut Dincer
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Jason Ernst
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Computer Science Department, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| |
Collapse
|
3
|
Frömel R, Rühle J, Bernal Martinez A, Szu-Tu C, Pacheco Pastor F, Martinez-Corral R, Velten L. Design principles of cell-state-specific enhancers in hematopoiesis. Cell 2025:S0092-8674(25)00449-0. [PMID: 40345201 DOI: 10.1016/j.cell.2025.04.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 02/02/2025] [Accepted: 04/10/2025] [Indexed: 05/11/2025]
Abstract
During cellular differentiation, enhancers transform overlapping gradients of transcription factors (TFs) to highly specific gene expression patterns. However, the vast complexity of regulatory DNA impedes the identification of the underlying cis-regulatory rules. Here, we characterized 64,400 fully synthetic DNA sequences to bottom-up dissect design principles of cell-state-specific enhancers in the context of the differentiation of blood stem cells to seven myeloid lineages. Focusing on binding sites for 38 TFs and their pairwise interactions, we found that identical sites displayed both repressive and activating function as a consequence of cell state, site combinatorics, or simply predicted occupancy of a TF on an enhancer. Surprisingly, combinations of activating sites frequently neutralized one another or gained repressive function. These negative synergies convert quantitative imbalances in TF expression into binary activity patterns. We exploit this principle to automatically create enhancers with specificity to user-defined combinations of hematopoietic progenitor cell states from scratch.
Collapse
Affiliation(s)
- Robert Frömel
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Julia Rühle
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Aina Bernal Martinez
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Chelsea Szu-Tu
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Felix Pacheco Pastor
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Rosa Martinez-Corral
- CRG (Barcelona Collaboratorium for Modelling and Predictive Biology), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Lars Velten
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| |
Collapse
|
4
|
Dudek MF, Wenz BM, Brown CD, Voight BF, Almasy L, Grant SFA. Characterization of non-coding variants associated with transcription-factor binding through ATAC-seq-defined footprint QTLs in liver. Am J Hum Genet 2025:S0002-9297(25)00140-5. [PMID: 40250421 DOI: 10.1016/j.ajhg.2025.03.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 03/27/2025] [Accepted: 03/27/2025] [Indexed: 04/20/2025] Open
Abstract
Non-coding variants discovered by genome-wide association studies (GWASs) are enriched in regulatory elements harboring transcription factor (TF) binding motifs, strongly suggesting a connection between disease association and the disruption of cis-regulatory sequences. Occupancy of a TF inside a region of open chromatin can be detected in ATAC-seq where bound TFs block the transposase Tn5, leaving a pattern of relatively depleted Tn5 insertions known as a "footprint." Here, we sought to identify variants associated with TF binding, or "footprint quantitative trait loci" (fpQTLs), in ATAC-seq data generated from 170 human liver samples. We used computational tools to scan the ATAC-seq reads to quantify TF binding likelihood as "footprint scores" at variants derived from whole-genome sequencing generated in the same samples. We tested for association between genotype and footprint score and observed 809 fpQTLs associated with footprint-inferred TF binding (FDR < 5%). Given that Tn5 insertion sites are measured with base-pair resolution, we show that fpQTLs can aid GWAS and QTL fine-mapping by precisely pinpointing TF activity within broad trait-associated loci where the underlying causal variant is unknown. Liver fpQTLs were strongly enriched across ChIP-seq peaks, liver expression QTLs (eQTLs), and liver-related GWAS loci, and their inferred effect on TF binding was concordant with their effect on underlying sequence motifs in 78% of cases. We conclude that fpQTLs can reveal causal GWAS variants, define the role of TF binding-site disruption in complex traits, and provide functional insights into non-coding variants, ultimately informing novel treatments for common diseases.
Collapse
Affiliation(s)
- Max F Dudek
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Brandon M Wenz
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Christopher D Brown
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Benjamin F Voight
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA; Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Laura Almasy
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Lifespan Brain Institute, Children's Hospital of Philadelphia and Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Struan F A Grant
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Endocrinology and Diabetes, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
| |
Collapse
|
5
|
Brown AR, Fox GA, Kaplow IM, Lawler AJ, Phan BN, Gadey L, Wirthlin ME, Ramamurthy E, May GE, Chen Z, Su Q, McManus CJ, van de Weerd R, Pfenning AR. An in vivo systemic massively parallel platform for deciphering animal tissue-specific regulatory function. Front Genet 2025; 16:1533900. [PMID: 40270544 PMCID: PMC12016043 DOI: 10.3389/fgene.2025.1533900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Accepted: 03/13/2025] [Indexed: 04/25/2025] Open
Abstract
Introduction: Transcriptional regulation is an important process wherein non-protein coding enhancer sequences play a key role in determining cell type identity and phenotypic diversity. In neural tissue, these gene regulatory processes are crucial for coordinating a plethora of interconnected and regionally specialized cell types, ensuring their synchronized activity in generating behavior. Recognizing the intricate interplay of gene regulatory processes in the brain is imperative, as mounting evidence links neurodevelopment and neurological disorders to non-coding genome regions. While genome-wide association studies are swiftly identifying non-coding human disease-associated loci, decoding regulatory mechanisms is challenging due to causal variant ambiguity and their specific tissue impacts. Methods: Massively parallel reporter assays (MPRAs) are widely used in cell culture to study the non-coding enhancer regions, linking genome sequence differences to tissue-specific regulatory function. However, widespread use in animals encounters significant challenges, including insufficient viral library delivery and library quantification, irregular viral transduction rates, and injection site inflammation disrupting gene expression. Here, we introduce a systemic MPRA (sysMPRA) to address these challenges through systemic intravenous AAV viral delivery. Results: We demonstrate successful transduction of the MPRA library into diverse mouse tissues, efficiently identifying tissue specificity in candidate enhancers and aligning well with predictions from machine learning models. We highlight that sysMPRA effectively uncovers regulatory effects stemming from the disruption of MEF2C transcription factor binding sites, single-nucleotide polymorphisms, and the consequences of genetic variations associated with late-onset Alzheimer's disease. Conclusion: SysMPRA is an effective library delivering method that simultaneously determines the transcriptional functions of hundreds of enhancers in vivo across multiple tissues.
Collapse
Affiliation(s)
- Ashley R. Brown
- Ray and Stephanie Lane Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, United States
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Grant A. Fox
- Ray and Stephanie Lane Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, United States
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Irene M. Kaplow
- Ray and Stephanie Lane Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, United States
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Alyssa J. Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, United States
| | - BaDoi N. Phan
- Ray and Stephanie Lane Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, United States
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| | - Lahari Gadey
- Ray and Stephanie Lane Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, United States
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Morgan E. Wirthlin
- Ray and Stephanie Lane Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, United States
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Easwaran Ramamurthy
- Ray and Stephanie Lane Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, United States
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Gemma E. May
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Ziheng Chen
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Qiao Su
- Ray and Stephanie Lane Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, United States
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| | - C. Joel McManus
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Robert van de Weerd
- Ray and Stephanie Lane Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, United States
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Andreas R. Pfenning
- Ray and Stephanie Lane Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, United States
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
6
|
Kumari P, Friedman RZ, Pi L, Curtis SW, Paraiso K, Visel A, Rhea L, Dunnwald M, Patni AP, Mar D, Bomsztyk K, Mathieu J, Ruohola-Baker H, Leslie EJ, White MA, Cohen BA, Cornell RA. Identification of functional non-coding variants associated with orofacial cleft. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.06.01.596914. [PMID: 40027800 PMCID: PMC11870446 DOI: 10.1101/2024.06.01.596914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Oral facial cleft (OFC) is a multifactorial disorder that can present as a cleft lip with or without cleft palate (CL/P) or a cleft palate only. Genome wide association studies (GWAS) of isolated OFC have identified common single nucleotide polymorphisms (SNPs) at the 1q32/ IRF6 locus and many other loci where, like IRF6 , the presumed OFC-relevant gene is expressed in embryonic oral epithelium. To identify the functional subset of SNPs at eight such loci we conducted a massively parallel reporter assay in a cell line derived from fetal oral epithelium, revealing SNPs with allele-specific effects on enhancer activity. We filtered these against chromatin-mark evidence of enhancers in relevant cell types or tissues, and then tested a subset in traditional reporter assays, yielding six candidates for functional SNPs in five loci (1q32/ IRF6 , 3q28/ TP63 , 6p24.3/ TFAP2A , 20q12/ MAFB , and 9q22.33/ FOXE1 ). We further tested two SNPs near IRF6 and one near FOXE1 by engineering the genome of induced pluripotent stem cells, differentiating the cells into embryonic oral epithelium, and measuring expression of IRF6 or FOXE1 and binding of transcription factors; the results strongly supported their candidacy. Conditional analyses of a meta-analysis of GWAS suggest that the two functional SNPs near IRF6 account for the majority of risk for CL/P associated with variation at this locus. This study connects genetic variation associated with orofacial cleft to mechanisms of pathogenesis.
Collapse
|
7
|
Catta-Preta R, Lindtner S, Ypsilanti A, Seban N, Price JD, Abnousi A, Su-Feher L, Wang Y, Cichewicz K, Boerma SA, Juric I, Jones IR, Akiyama JA, Hu M, Shen Y, Visel A, Pennacchio LA, Dickel DE, Rubenstein JLR, Nord AS. Combinatorial transcription factor binding encodes cis-regulatory wiring of mouse forebrain GABAergic neurogenesis. Dev Cell 2025; 60:288-304.e6. [PMID: 39481376 PMCID: PMC11753952 DOI: 10.1016/j.devcel.2024.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 06/17/2024] [Accepted: 10/03/2024] [Indexed: 11/02/2024]
Abstract
Transcription factors (TFs) bind combinatorially to cis-regulatory elements, orchestrating transcriptional programs. Although studies of chromatin state and chromosomal interactions have demonstrated dynamic neurodevelopmental cis-regulatory landscapes, parallel understanding of TF interactions lags. To elucidate combinatorial TF binding driving mouse basal ganglia development, we integrated chromatin immunoprecipitation sequencing (ChIP-seq) for twelve TFs, H3K4me3-associated enhancer-promoter interactions, chromatin and gene expression data, and functional enhancer assays. We identified sets of putative regulatory elements with shared TF binding (TF-pRE modules) that orchestrate distinct processes of GABAergic neurogenesis and suppress other cell fates. The majority of pREs were bound by one or two TFs; however, a small proportion were extensively bound. These sequences had exceptional evolutionary conservation and motif density, complex chromosomal interactions, and activity as in vivo enhancers. Our results provide insights into the combinatorial TF-pRE interactions that activate and repress expression programs during telencephalon neurogenesis and demonstrate the value of TF binding toward modeling developmental transcriptional wiring.
Collapse
Affiliation(s)
- Rinaldo Catta-Preta
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA
| | - Susan Lindtner
- Nina Ireland Laboratory of Developmental Neurobiology, Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Athena Ypsilanti
- Nina Ireland Laboratory of Developmental Neurobiology, Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Nicolas Seban
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA
| | - James D Price
- Nina Ireland Laboratory of Developmental Neurobiology, Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Armen Abnousi
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH 44106, USA
| | - Linda Su-Feher
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA
| | - Yurong Wang
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA
| | - Karol Cichewicz
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA
| | - Sally A Boerma
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA
| | - Ivan Juric
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH 44106, USA
| | - Ian R Jones
- Institute for Human Genetics, Department of Neurology, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Jennifer A Akiyama
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH 44106, USA
| | - Yin Shen
- Institute for Human Genetics, Department of Neurology, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Axel Visel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; U.S. Department of Energy Joint Genome Institute, Walnut Creek, CA 94598, USA; School of Natural Sciences, University of California, Merced, Merced, CA 95343, USA
| | - Len A Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; U.S. Department of Energy Joint Genome Institute, Walnut Creek, CA 94598, USA; Comparative Biochemistry Program, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Diane E Dickel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - John L R Rubenstein
- Nina Ireland Laboratory of Developmental Neurobiology, Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA.
| | - Alex S Nord
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA.
| |
Collapse
|
8
|
Friedman RZ, Ramu A, Lichtarge S, Wu Y, Tripp L, Lyon D, Myers CA, Granas DM, Gause M, Corbo JC, Cohen BA, White MA. Active learning of enhancers and silencers in the developing neural retina. Cell Syst 2025; 16:101163. [PMID: 39778579 PMCID: PMC11827711 DOI: 10.1016/j.cels.2024.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 10/17/2024] [Accepted: 12/06/2024] [Indexed: 01/11/2025]
Abstract
Deep learning is a promising strategy for modeling cis-regulatory elements. However, models trained on genomic sequences often fail to explain why the same transcription factor can activate or repress transcription in different contexts. To address this limitation, we developed an active learning approach to train models that distinguish between enhancers and silencers composed of binding sites for the photoreceptor transcription factor cone-rod homeobox (CRX). After training the model on nearly all bound CRX sites from the genome, we coupled synthetic biology with uncertainty sampling to generate additional rounds of informative training data. This allowed us to iteratively train models on data from multiple rounds of massively parallel reporter assays. The ability of the resulting models to discriminate between CRX sites with identical sequence but opposite functions establishes active learning as an effective strategy to train models of regulatory DNA. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Ryan Z Friedman
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Avinash Ramu
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Sara Lichtarge
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Yawei Wu
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Lloyd Tripp
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Daniel Lyon
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Connie A Myers
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - David M Granas
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Maria Gause
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Joseph C Corbo
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Barak A Cohen
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Michael A White
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA.
| |
Collapse
|
9
|
Chang TY, Waxman DJ. HDI-STARR-seq: Condition-specific enhancer discovery in mouse liver in vivo. BMC Genomics 2024; 25:1240. [PMID: 39716078 DOI: 10.1186/s12864-024-11162-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Accepted: 12/16/2024] [Indexed: 12/25/2024] Open
Abstract
BACKGROUND STARR-seq and other massively-parallel reporter assays are widely used to discover functional enhancers in transfected cell models, which can be confounded by plasmid vector-induced type-I interferon immune responses and lack the multicellular environment and endogenous chromatin state of complex mammalian tissues. RESULTS We describe HDI-STARR-seq, which combines STARR-seq plasmid library delivery to the liver, by hydrodynamic tail vein injection (HDI), with reporter RNA transcriptional initiation driven by a minimal Albumin promoter, which we show is essential for mouse liver STARR-seq enhancer activity assayed 7 days after HDI. Importantly, little or no vector-induced innate type-I interferon responses were observed. Comparisons of HDI-STARR-seq activity between male and female mouse livers and in livers from males treated with an activating ligand of the transcription factor (TF) CAR (Nr1i3) identified many condition-dependent enhancers linked to condition-specific gene expression. Further, thousands of active liver enhancers were identified using a high complexity STARR-seq library comprised of ~ 50,000 genomic regions released by DNase-I digestion of mouse liver nuclei. When compared to stringently inactive library sequences, the active enhancer sequences identified were highly enriched for liver open chromatin regions with activating histone marks (H3K27ac, H3K4me1, H3K4me3), were significantly closer to gene transcriptional start sites, and were significantly depleted of repressive (H3K27me3, H3K9me3) and transcribed region histone marks (H3K36me3). CONCLUSION HDI-STARR-seq offers substantial improvements over current methodologies for large scale, functional profiling of enhancers, including condition-dependent enhancers, in liver tissue in vivo, and can be adapted to characterize enhancer activities in a variety of species and tissues by selecting suitable tissue- and species-specific promoter sequences.
Collapse
Affiliation(s)
- Ting-Ya Chang
- Departments of Biology and Biomedical Engineering, and Bioinformatics Program, Boston University, 5 Cummington Mall, Boston, MA, 02215, USA
| | - David J Waxman
- Departments of Biology and Biomedical Engineering, and Bioinformatics Program, Boston University, 5 Cummington Mall, Boston, MA, 02215, USA.
| |
Collapse
|
10
|
Falo-Sanjuan J, Diaz-Tirado Y, Turner MA, Rourke O, Davis J, Medrano C, Haines J, McKenna J, Karshenas A, Eisen MB, Garcia HG. Targeted mutagenesis of specific genomic DNA sequences in animals for the in vivo generation of variant libraries. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.10.598328. [PMID: 38915503 PMCID: PMC11195090 DOI: 10.1101/2024.06.10.598328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Understanding how the number, placement and affinity of transcription factor binding sites dictates gene regulatory programs remains a major unsolved challenge in biology, particularly in the context of multicellular organisms. To uncover these rules, it is first necessary to find the binding sites within a regulatory region with high precision, and then to systematically modulate this binding site arrangement while simultaneously measuring the effect of this modulation on output gene expression. Massively parallel reporter assays (MPRAs), where the gene expression stemming from 10,000s of in vitro-generated regulatory sequences is measured, have made this feat possible in high-throughput in single cells in culture. However, because of lack of technologies to incorporate DNA libraries, MPRAs are limited in whole organisms. To enable MPRAs in multicellular organisms, we generated tools to create a high degree of mutagenesis in specific genomic loci in vivo using base editing. Targeting GFP integrated in the genome of Drosophila cell culture and whole animals as a case study, we show that the base editor AIDevoCDA1 stemming from sea lamprey fused to nCas9 is highly mutagenic. Surprisingly, longer gRNAs increase mutation efficiency and expand the mutating window, which can allow the introduction of mutations in previously untargetable sequences. Finally, we demonstrate arrays of >20 gRNAs that can efficiently introduce mutations along a 200bp sequence, making it a promising tool to test enhancer function in vivo in a high throughput manner.
Collapse
Affiliation(s)
- Julia Falo-Sanjuan
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Yuliana Diaz-Tirado
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Meghan A. Turner
- Biophysics Graduate Group, University of California, Berkeley, CA, USA
| | - Olivia Rourke
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Julian Davis
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Claudia Medrano
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Jenna Haines
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Joey McKenna
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Arman Karshenas
- Biophysics Graduate Group, University of California, Berkeley, CA, USA
| | - Michael B. Eisen
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- Department of Integrative Biology, University of California, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA, USA
| | - Hernan G. Garcia
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- Department of Physics, University of California, Berkeley, CA, USA
- Institute for Quantitative Biosciences-QB3, University of California, Berkeley, CA, USA
- Chan Zuckerberg Biohub – San Francisco, San Francisco, CA, USA
- Biophysics Graduate Group, University of California, Berkeley, CA, USA
| |
Collapse
|
11
|
Karshenas A, Röschinger T, Garcia HG. Predictive Modeling of Gene Expression and Localization of DNA Binding Site Using Deep Convolutional Neural Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.17.629042. [PMID: 39763851 PMCID: PMC11702772 DOI: 10.1101/2024.12.17.629042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/16/2025]
Abstract
Despite the sequencing revolution, large swaths of the genomes sequenced to date lack any information about the arrangement of transcription factor binding sites on regulatory DNA. Massively Parallel Reporter Assays (MPRAs) have the potential to dramatically accelerate our genomic annotations by making it possible to measure the gene expression levels driven by thousands of mutational variants of a regulatory region. However, the interpretation of such data often assumes that each base pair in a regulatory sequence contributes independently to gene expression. To enable the analysis of this data in a manner that accounts for possible correlations between distant bases along a regulatory sequence, we developed the Deep learning Adaptable Regulatory Sequence Identifier (DARSI). This convolutional neural network leverages MPRA data to predict gene expression levels directly from raw regulatory DNA sequences. By harnessing this predictive capacity, DARSI systematically identifies transcription factor binding sites within regulatory regions at single-base pair resolution. To validate its predictions, we benchmarked DARSI against curated databases, confirming its accuracy in predicting transcription factor binding sites. Additionally, DARSI predicted novel unmapped binding sites, paving the way for future experimental efforts to confirm the existence of these binding sites and to identify the transcription factors that target those sites. Thus, by automating and improving the annotation of regulatory regions, DARSI generates experimentally actionable predictions that can feed iterations of the theory-experiment cycle aimed at reaching a predictive understanding of transcriptional control.
Collapse
Affiliation(s)
- Arman Karshenas
- Biophysics Graduate Group, University of California at Berkeley, Berkeley, CA, USA
| | - Tom Röschinger
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Hernan G. Garcia
- Biophysics Graduate Group, University of California at Berkeley, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, CA, USA
- Department of Physics, University of California, Berkeley, CA, USA
- Institute for Quantitative Biosciences-QB3, University of California, Berkeley, CA, USA
- Chan Zuckerberg Biohub – San Francisco, San Francisco, CA, USA
| |
Collapse
|
12
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
13
|
Bruner WS, Grant SFA. Translation of genome-wide association study: from genomic signals to biological insights. Front Genet 2024; 15:1375481. [PMID: 39421299 PMCID: PMC11484060 DOI: 10.3389/fgene.2024.1375481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 09/24/2024] [Indexed: 10/19/2024] Open
Abstract
Since the turn of the 21st century, genome-wide association study (GWAS) have successfully identified genetic signals associated with a myriad of common complex traits and diseases. As we transition from establishing robust genetic associations with diverse phenotypes, the central challenge is now focused on characterizing the underlying functional mechanisms driving these signals. Previous GWAS efforts have revealed multiple variants, each conferring relatively subtle susceptibility, collectively contributing to the pathogenesis of various common diseases. Such variants can further exhibit associations with multiple other traits and differ across ancestries, plus disentangling causal variants from non-causal due to linkage disequilibrium complexities can lead to challenges in drawing direct biological conclusions. Combined with cellular context considerations, such challenges can reduce the capacity to definitively elucidate the biological significance of GWAS signals, limiting the potential to define mechanistic insights. This review will detail current and anticipated approaches for functional interpretation of GWAS signals, both in terms of characterizing the underlying causal variants and the corresponding effector genes.
Collapse
Affiliation(s)
- Winter S. Bruner
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
| | - Struan F. A. Grant
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Division of Endocrinology and Diabetes, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
| |
Collapse
|
14
|
Bond ML, Quiroga-Barber IY, D’Costa S, Wu Y, Bell JL, McAfee JC, Kramer NE, Lee S, Patrucco M, Phanstiel DH, Won H. Deciphering the functional impact of Alzheimer's Disease-associated variants in resting and proinflammatory immune cells. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.09.13.24313654. [PMID: 39371155 PMCID: PMC11451667 DOI: 10.1101/2024.09.13.24313654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Genome-wide association studies have identified loci associated with Alzheimer's Disease (AD), but identifying the exact causal variants and genes at each locus is challenging due to linkage disequilibrium and their largely non-coding nature. To address this, we performed a massively parallel reporter assay of 3,576 AD-associated variants in THP-1 macrophages in both resting and proinflammatory states and identified 47 expression-modulating variants (emVars). To understand the endogenous chromatin context of emVars, we built an activity-by-contact model using epigenomic maps of macrophage inflammation and inferred condition-specific enhancer-promoter pairs. Intersection of emVars with enhancer-promoter pairs and microglia expression quantitative trait loci allowed us to connect 39 emVars to 76 putative AD risk genes enriched for AD-associated molecular signatures. Overall, systematic characterization of AD-associated variants enhances our understanding of the regulatory mechanisms underlying AD pathogenesis.
Collapse
Affiliation(s)
- Marielle L. Bond
- Curriculum in Genetics & Molecular Biology, University of North Carolina at Chapel Hill
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | | | - Susan D’Costa
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
| | - Yijia Wu
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Jessica L. Bell
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Jessica C. McAfee
- Curriculum in Genetics & Molecular Biology, University of North Carolina at Chapel Hill
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Nicole E. Kramer
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill
| | - Sool Lee
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill
| | - Mary Patrucco
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Douglas H. Phanstiel
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Department of Cell Biology & Physiology, University of North Carolina at Chapel Hill
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| |
Collapse
|
15
|
Gordon MG, Kathail P, Choy B, Kim MC, Mazumder T, Gearing M, Ye CJ. Population Diversity at the Single-Cell Level. Annu Rev Genomics Hum Genet 2024; 25:27-49. [PMID: 38382493 DOI: 10.1146/annurev-genom-021623-083207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Population-scale single-cell genomics is a transformative approach for unraveling the intricate links between genetic and cellular variation. This approach is facilitated by cutting-edge experimental methodologies, including the development of high-throughput single-cell multiomics and advances in multiplexed environmental and genetic perturbations. Examining the effects of natural or synthetic genetic variants across cellular contexts provides insights into the mutual influence of genetics and the environment in shaping cellular heterogeneity. The development of computational methodologies further enables detailed quantitative analysis of molecular variation, offering an opportunity to examine the respective roles of stochastic, intercellular, and interindividual variation. Future opportunities lie in leveraging long-read sequencing, refining disease-relevant cellular models, and embracing predictive and generative machine learning models. These advancements hold the potential for a deeper understanding of the genetic architecture of human molecular traits, which in turn has important implications for understanding the genetic causes of human disease.
Collapse
Affiliation(s)
| | - Pooja Kathail
- Center for Computational Biology, University of California, Berkeley, California, USA
| | - Bryson Choy
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
| | - Min Cheol Kim
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
| | - Thomas Mazumder
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
| | - Melissa Gearing
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
| | - Chun Jimmie Ye
- Arc Institute, Palo Alto, California, USA
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
- Bakar Computational Health Sciences Institute, Gladstone-UCSF Institute of Genomic Immunology, Parker Institute for Cancer Immunotherapy, Department of Epidemiology and Biostatistics, Department of Microbiology and Immunology, and Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California, USA;
| |
Collapse
|
16
|
Chang TY, Waxman DJ. HDI-STARR-seq: Condition-specific enhancer discovery in mouse liver in vivo. RESEARCH SQUARE 2024:rs.3.rs-4559581. [PMID: 38978599 PMCID: PMC11230509 DOI: 10.21203/rs.3.rs-4559581/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background STARR-seq and other massively-parallel reporter assays are widely used to discover functional enhancers in transfected cell models, which can be confounded by plasmid vector-induced type-I interferon immune responses and lack the multicellular environment and endogenous chromatin state of complex mammalian tissues. Results Here, we describe HDI-STARR-seq, which combines STARR-seq plasmid library delivery to the liver, by hydrodynamic tail vein injection (HDI), with reporter RNA transcriptional initiation driven by a minimal Albumin promoter, which we show is essential for mouse liver STARR-seq enhancer activity assayed 7 days after HDI. Importantly, little or no vector-induced innate type-I interferon responses were observed. Comparisons of HDI-STARR-seq activity between male and female mouse livers and in livers from males treated with an activating ligand of the transcription factor CAR (Nr1i3) identified many condition-dependent enhancers linked to condition-specific gene expression. Further, thousands of active liver enhancers were identified using a high complexity STARR-seq library comprised of ~ 50,000 genomic regions released by DNase-I digestion of mouse liver nuclei. When compared to stringently inactive library sequences, the active enhancer sequences identified were highly enriched for liver open chromatin regions with activating histone marks (H3K27ac, H3K4me1, H3K4me3), were significantly closer to gene transcriptional start sites, and were significantly depleted of repressive (H3K27me3, H3K9me3) and transcribed region histone marks (H3K36me3). Conclusions HDI-STARR-seq offers substantial improvements over current methodologies for large scale, functional profiling of enhancers, including condition-dependent enhancers, in liver tissue in vivo, and can be adapted to characterize enhancer activities in a variety of species and tissues by selecting suitable tissue- and species-specific promoter sequences.
Collapse
|
17
|
Yin C, Hair SC, Byeon GW, Bromley P, Meuleman W, Seelig G. Iterative deep learning-design of human enhancers exploits condensed sequence grammar to achieve cell type-specificity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599076. [PMID: 38915713 PMCID: PMC11195158 DOI: 10.1101/2024.06.14.599076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
An important and largely unsolved problem in synthetic biology is how to target gene expression to specific cell types. Here, we apply iterative deep learning to design synthetic enhancers with strong differential activity between two human cell lines. We initially train models on published datasets of enhancer activity and chromatin accessibility and use them to guide the design of synthetic enhancers that maximize predicted specificity. We experimentally validate these sequences, use the measurements to re-optimize the predictor, and design a second generation of enhancers with improved specificity. Our design methods embed relevant transcription factor binding site (TFBS) motifs with higher frequencies than comparable endogenous enhancers while using a more selective motif vocabulary, and we show that enhancer activity is correlated with transcription factor expression at the single cell level. Finally, we characterize causal features of top enhancers via perturbation experiments and show enhancers as short as 50bp can maintain specificity.
Collapse
Affiliation(s)
- Christopher Yin
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA
| | | | - Gun Woo Byeon
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA
| | - Peter Bromley
- Altius Institute for Biomedical Sciences, Seattle, WA
| | - Wouter Meuleman
- Altius Institute for Biomedical Sciences, Seattle, WA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| | - Georg Seelig
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| |
Collapse
|
18
|
Chang TY, Waxman DJ. HDI-STARR-seq: Condition-specific enhancer discovery in mouse liver in vivo. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.10.598329. [PMID: 38915578 PMCID: PMC11195054 DOI: 10.1101/2024.06.10.598329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
STARR-seq and other massively-parallel reporter assays are widely used to discover functional enhancers in transfected cell models, which can be confounded by plasmid vector-induced type-I interferon immune responses and lack the multicellular environment and endogenous chromatin state of complex mammalian tissues. Here, we describe HDI-STARR-seq, which combines STARR-seq plasmid library delivery to the liver, by hydrodynamic tail vein injection (HDI), with reporter RNA transcriptional initiation driven by a minimal Albumin promoter, which we show is essential for mouse liver STARR-seq enhancer activity assayed 7 days after HDI. Importantly, little or no vector-induced innate type-I interferon responses were observed. Comparisons of HDI-STARR-seq activity between male and female mouse livers and in livers from males treated with an activating ligand of the transcription factor CAR (Nr1i3) identified many condition-dependent enhancers linked to condition-specific gene expression. Further, thousands of active liver enhancers were identified using a high complexity STARR-seq library comprised of ~50,000 genomic regions released by DNase-I digestion of mouse liver nuclei. When compared to stringently inactive library sequences, the active enhancer sequences identified were highly enriched for liver open chromatin regions with activating histone marks (H3K27ac, H3K4me1, H3K4me3), were significantly closer to gene transcriptional start sites, and were significantly depleted of repressive (H3K27me3, H3K9me3) and transcribed region histone marks (H3K36me3). HDI-STARR-seq offers substantial improvements over current methodologies for large scale, functional profiling of enhancers, including condition-dependent enhancers, in liver tissue in vivo, and can be adapted to characterize enhancer activities in a variety of species and tissues by selecting suitable tissue- and species-specific promoter sequences.
Collapse
Affiliation(s)
- Ting-Ya Chang
- Departments of Biology and Biomedical Engineering, and Bioinformatics program, Boston University, Boston, MA 02215
| | - David J Waxman
- Departments of Biology and Biomedical Engineering, and Bioinformatics program, Boston University, Boston, MA 02215
| |
Collapse
|
19
|
Park SJ, Nakai K. A computational approach for deciphering the interactions between proximal and distal gene regulators in GC B-cell response. NAR Genom Bioinform 2024; 6:lqae050. [PMID: 38711859 PMCID: PMC11071120 DOI: 10.1093/nargab/lqae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 04/15/2024] [Accepted: 04/27/2024] [Indexed: 05/08/2024] Open
Abstract
Delineating the intricate interplay between promoter-proximal and -distal regulators is crucial for understanding the function of transcriptional mediator complexes implicated in the regulation of gene expression. The present study aimed to develop a computational method for accurately modeling the spatial proximal and distal regulatory interactions. Our method combined regression-based models to identify key regulators through gene expression prediction and a graph-embedding approach to detect coregulated genes. This approach enabled a detailed investigation of the gene regulatory mechanisms for germinal center B cells, accompanied by dramatic rearrangements of the genome structure. We found that while the promoter-proximal regulatory elements were the principal regulators of gene expression, the distal regulators fine-tuned transcription. Moreover, our approach unveiled the presence of modular regulators, such as cofactors and proximal/distal transcription factors, which were co-expressed with their target genes. Some of these modules exhibited abnormal expression patterns in lymphoma. These findings suggest that the dysregulation of interactions between transcriptional and architectural factors is associated with chromatin reorganization failure, which may increase the risk of malignancy. Therefore, our computational approach helps decipher the transcriptional cis-regulatory code spatially interacting.
Collapse
Affiliation(s)
- Sung-Joon Park
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Kenta Nakai
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| |
Collapse
|
20
|
Bhattarai KR, Mobley RJ, Barnett KR, Ferguson DC, Hansen BS, Diedrich JD, Bergeron BP, Yoshimura S, Yang W, Crews KR, Manring CS, Jabbour E, Paietta E, Litzow MR, Kornblau SM, Stock W, Inaba H, Jeha S, Pui CH, Cheng C, Pruett-Miller SM, Relling MV, Yang JJ, Evans WE, Savic D. Investigation of inherited noncoding genetic variation impacting the pharmacogenomics of childhood acute lymphoblastic leukemia treatment. Nat Commun 2024; 15:3681. [PMID: 38693155 PMCID: PMC11063049 DOI: 10.1038/s41467-024-48124-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 04/18/2024] [Indexed: 05/03/2024] Open
Abstract
Defining genetic factors impacting chemotherapy failure can help to better predict response and identify drug resistance mechanisms. However, there is limited understanding of the contribution of inherited noncoding genetic variation on inter-individual differences in chemotherapy response in childhood acute lymphoblastic leukemia (ALL). Here we map inherited noncoding variants associated with treatment outcome and/or chemotherapeutic drug resistance to ALL cis-regulatory elements and investigate their gene regulatory potential and target gene connectivity using massively parallel reporter assays and three-dimensional chromatin looping assays, respectively. We identify 54 variants with transcriptional effects and high-confidence gene connectivity. Additionally, functional interrogation of the top variant, rs1247117, reveals changes in chromatin accessibility, PU.1 binding affinity and gene expression, and deletion of the genomic interval containing rs1247117 sensitizes cells to vincristine. Together, these data demonstrate that noncoding regulatory variants associated with diverse pharmacological traits harbor significant effects on allele-specific transcriptional activity and impact sensitivity to antileukemic agents.
Collapse
Affiliation(s)
- Kashi Raj Bhattarai
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Robert J Mobley
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Kelly R Barnett
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Daniel C Ferguson
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Baranda S Hansen
- Center for Advanced Genome Engineering, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Cell and Molecular Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jonathan D Diedrich
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Brennan P Bergeron
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Satoshi Yoshimura
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Advanced Pediatric Medicine, Tohoku University School of Medicine, Tokyo, Japan
| | - Wenjian Yang
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Kristine R Crews
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Christopher S Manring
- Alliance Hematologic Malignancy Biorepository; Clara D. Bloomfield Center for Leukemia Outcomes Research, Columbus, OH, 43210, USA
| | - Elias Jabbour
- Department of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Mark R Litzow
- Division of Hematology, Department of Medicine, Mayo Clinic, Rochester, MN, 55905, USA
| | - Steven M Kornblau
- Department of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Wendy Stock
- Comprehensive Cancer Center, University of Chicago Medicine, Chicago, IL, USA
| | - Hiroto Inaba
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Oncology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Sima Jeha
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Oncology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Ching-Hon Pui
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Oncology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Cheng Cheng
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Shondra M Pruett-Miller
- Center for Advanced Genome Engineering, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Cell and Molecular Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Mary V Relling
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jun J Yang
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Integrated Biomedical Sciences Program, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
| | - William E Evans
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Daniel Savic
- Hematological Malignancies Program, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
- Integrated Biomedical Sciences Program, University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
| |
Collapse
|
21
|
Hurabielle C, LaFlam TN, Gearing M, Ye CJ. Functional genomics in inborn errors of immunity. Immunol Rev 2024; 322:53-70. [PMID: 38329267 PMCID: PMC10950534 DOI: 10.1111/imr.13309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Inborn errors of immunity (IEI) comprise a diverse spectrum of 485 disorders as recognized by the International Union of Immunological Societies Committee on Inborn Error of Immunity in 2022. While IEI are monogenic by definition, they illuminate various pathways involved in the pathogenesis of polygenic immune dysregulation as in autoimmune or autoinflammatory syndromes, or in more common infectious diseases that may not have a significant genetic basis. Rapid improvement in genomic technologies has been the main driver of the accelerated rate of discovery of IEI and has led to the development of innovative treatment strategies. In this review, we will explore various facets of IEI, delving into the distinctions between PIDD and PIRD. We will examine how Mendelian inheritance patterns contribute to these disorders and discuss advancements in functional genomics that aid in characterizing new IEI. Additionally, we will explore how emerging genomic tools help to characterize new IEI as well as how they are paving the way for innovative treatment approaches for managing and potentially curing these complex immune conditions.
Collapse
Affiliation(s)
- Charlotte Hurabielle
- Division of Rheumatology, Department of Medicine, UCSF, San Francisco, California, USA
| | - Taylor N LaFlam
- Division of Pediatric Rheumatology, Department of Pediatrics, UCSF, San Francisco, California, USA
| | - Melissa Gearing
- Division of Rheumatology, Department of Medicine, UCSF, San Francisco, California, USA
| | - Chun Jimmie Ye
- Institute for Human Genetics, UCSF, San Francisco, California, USA
- Institute of Computational Health Sciences, UCSF, San Francisco, California, USA
- Gladstone Genomic Immunology Institute, San Francisco, California, USA
- Parker Institute for Cancer Immunotherapy, UCSF, San Francisco, California, USA
- Department of Epidemiology and Biostatistics, UCSF, San Francisco, California, USA
- Department of Microbiology and Immunology, UCSF, San Francisco, California, USA
- Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, California, USA
- Arc Institute, Palo Alto, California, USA
| |
Collapse
|
22
|
Liu J, Ashuach T, Inoue F, Ahituv N, Yosef N, Kreimer A. Optimizing sequence design strategies for perturbation MPRAs: a computational evaluation framework. Nucleic Acids Res 2024; 52:1613-1627. [PMID: 38296821 PMCID: PMC10939410 DOI: 10.1093/nar/gkae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 12/26/2023] [Accepted: 01/12/2024] [Indexed: 02/02/2024] Open
Abstract
The advent of perturbation-based massively parallel reporter assays (MPRAs) technique has facilitated the delineation of the roles of non-coding regulatory elements in orchestrating gene expression. However, computational efforts remain scant to evaluate and establish guidelines for sequence design strategies for perturbation MPRAs. In this study, we propose a framework for evaluating and comparing various perturbation strategies for MPRA experiments. Within this framework, we benchmark three different perturbation approaches from the perspectives of alteration in motif-based profiles, consistency of MPRA outputs, and robustness of models that predict the activities of putative regulatory motifs. While our analyses show very similar results across multiple benchmarking metrics, the predictive modeling for the approach involving random nucleotide shuffling shows significant robustness compared with the other two approaches. Thus, we recommend designing sequences by randomly shuffling the nucleotides of the perturbed site in perturbation-MPRA, followed by a coherence check to prevent the introduction of other variations of the target motifs. In summary, our evaluation framework and the benchmarking findings create a resource of computational pipelines and highlight the potential of perturbation-MPRA in predicting non-coding regulatory activities.
Collapse
Affiliation(s)
- Jiayi Liu
- Graduate Program in Cell & Developmental Biology, Rutgers, The State University of New Jersey, 604 Allison Rd, Piscataway, NJ 08854, USA
- Department of Biochemistry and Molecular Biology, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, NJ 08854, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, NJ 08854, USA
| | - Tal Ashuach
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, 387 Soda Hall, Berkeley, CA 94720, USA
| | - Fumitaka Inoue
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Faculty of Medicine Building B, Yoshidatachibanacho, Sakyo Ward, Kyoto 606-8303, Japan
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, 1700 4th Street, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California, 513 Parnassus Ave, San Francisco, CA 94143, USA
| | - Nir Yosef
- Department of Systems Immunology, Weizmann Institute of Science, 234 Herzl Street, Rehovot 7610001, Israel
- Chan-Zuckerberg Biohub, 499 Illinois St, San Francisco, CA 94158, USA
- Department of Systems Immunology, Ragon Institute of MGH, MIT, and Harvard Institute of Science, 400 Technology Square, Cambridge, MA 02139, USA
| | - Anat Kreimer
- Department of Biochemistry and Molecular Biology, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, NJ 08854, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, NJ 08854, USA
| |
Collapse
|
23
|
Farrow SL, Gokuladhas S, Schierding W, Pudjihartono M, Perry JK, Cooper AA, O'Sullivan JM. Identification of 27 allele-specific regulatory variants in Parkinson's disease using a massively parallel reporter assay. NPJ Parkinsons Dis 2024; 10:44. [PMID: 38413607 PMCID: PMC10899198 DOI: 10.1038/s41531-024-00659-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 02/12/2024] [Indexed: 02/29/2024] Open
Abstract
Genome wide association studies (GWAS) have identified a number of genomic loci that are associated with Parkinson's disease (PD) risk. However, the majority of these variants lie in non-coding regions, and thus the mechanisms by which they influence disease development, and/or potential subtypes, remain largely elusive. To address this, we used a massively parallel reporter assay (MPRA) to screen the regulatory function of 5254 variants that have a known or putative connection to PD. We identified 138 loci with enhancer activity, of which 27 exhibited allele-specific regulatory activity in HEK293 cells. The identified regulatory variant(s) typically did not match the original tag variant within the PD associated locus, supporting the need for deeper exploration of these loci. The existence of allele specific transcriptional impacts within HEK293 cells, confirms that at least a subset of the PD associated regions mark functional gene regulatory elements. Future functional studies that confirm the putative targets of the empirically verified regulatory variants will be crucial for gaining a greater understanding of how gene regulatory network(s) modulate PD risk.
Collapse
Affiliation(s)
- Sophie L Farrow
- Liggins Institute, The University of Auckland, Auckland, New Zealand.
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand.
| | | | - William Schierding
- Liggins Institute, The University of Auckland, Auckland, New Zealand
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
- Department of Ophthalmology, The University of Auckland, Auckland, New Zealand
| | | | - Jo K Perry
- Liggins Institute, The University of Auckland, Auckland, New Zealand
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
| | - Antony A Cooper
- Australian Parkinsons Mission, Garvan Institute of Medical Research, Sydney, NSW, Australia
- St Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia
| | - Justin M O'Sullivan
- Liggins Institute, The University of Auckland, Auckland, New Zealand.
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand.
- Australian Parkinsons Mission, Garvan Institute of Medical Research, Sydney, NSW, Australia.
- Singapore Institute for Clinical Sciences, Agency for Science Technology and Research, Singapore, Singapore.
- MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton, United Kingdom.
| |
Collapse
|
24
|
Kwak IY, Kim BC, Lee J, Kang T, Garry DJ, Zhang J, Gong W. Proformer: a hybrid macaron transformer model predicts expression values from promoter sequences. BMC Bioinformatics 2024; 25:81. [PMID: 38378442 PMCID: PMC10877777 DOI: 10.1186/s12859-024-05645-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 01/08/2024] [Indexed: 02/22/2024] Open
Abstract
The breakthrough high-throughput measurement of the cis-regulatory activity of millions of randomly generated promoters provides an unprecedented opportunity to systematically decode the cis-regulatory logic that determines the expression values. We developed an end-to-end transformer encoder architecture named Proformer to predict the expression values from DNA sequences. Proformer used a Macaron-like Transformer encoder architecture, where two half-step feed forward (FFN) layers were placed at the beginning and the end of each encoder block, and a separable 1D convolution layer was inserted after the first FFN layer and in front of the multi-head attention layer. The sliding k-mers from one-hot encoded sequences were mapped onto a continuous embedding, combined with the learned positional embedding and strand embedding (forward strand vs. reverse complemented strand) as the sequence input. Moreover, Proformer introduced multiple expression heads with mask filling to prevent the transformer models from collapsing when training on relatively small amount of data. We empirically determined that this design had significantly better performance than the conventional design such as using the global pooling layer as the output layer for the regression task. These analyses support the notion that Proformer provides a novel method of learning and enhances our understanding of how cis-regulatory sequences determine the expression values.
Collapse
Affiliation(s)
- Il-Youp Kwak
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Byeong-Chan Kim
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Juhyun Lee
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Taein Kang
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Daniel J Garry
- Cardiovascular Division, Department of Medicine, Lillehei Heart Institute, University of Minnesota, 2231 6th St SE, Minneapolis, MN, 55455, USA.
- Stem Cell Institute, University of Minnesota, Minneapolis, MN, 55455, USA.
- Paul and Sheila Wellstone Muscular Dystrophy Center, University of Minnesota, Minneapolis, MN, 55455, USA.
| | - Jianyi Zhang
- Department of Biomedical Engineering, The University of Alabama at Birmingham, Birmingham, AL, 35233, USA
| | - Wuming Gong
- Cardiovascular Division, Department of Medicine, Lillehei Heart Institute, University of Minnesota, 2231 6th St SE, Minneapolis, MN, 55455, USA.
| |
Collapse
|
25
|
Chen AB, Yu X, Thapa KS, Gao H, Reiter JL, Xuei X, Tsai AP, Landreth GE, Lai D, Wang Y, Foroud TM, Tischfield JA, Edenberg HJ, Liu Y. Functional 3'-UTR Variants Identify Regulatory Mechanisms Impacting Alcohol Use Disorder and Related Traits. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.31.578270. [PMID: 38370821 PMCID: PMC10871301 DOI: 10.1101/2024.01.31.578270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Although genome-wide association studies (GWAS) have identified loci associated with alcohol consumption and alcohol use disorder (AUD), they do not identify which variants are functional. To approach this, we evaluated the impact of variants in 3' untranslated regions (3'-UTRs) of genes in loci associated with substance use and neurological disorders using a massively parallel reporter assay (MPRA) in neuroblastoma and microglia cells. Functionally impactful variants explained a higher proportion of heritability of alcohol traits than non-functional variants. We identified genes whose 3'UTR activities are associated with AUD and alcohol consumption by combining variant effects from MPRA with GWAS results. We examined their effects by evaluating gene expression after CRISPR inhibition of neuronal cells and stratifying brain tissue samples by MPRA-derived 3'-UTR activity. A pathway analysis of differentially expressed genes identified inflammation response pathways. These analyses suggest that variation in response to inflammation contributes to the propensity to increase alcohol consumption.
Collapse
Affiliation(s)
- Andy B. Chen
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana
| | - Xuhong Yu
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana
| | - Kriti S. Thapa
- Department of Biochemistry & Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana
| | - Hongyu Gao
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana
- Center for Medical Genomics, Indiana University School of Medicine, Indianapolis, Indiana
| | - Jill L Reiter
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana
| | - Xiaoling Xuei
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana
- Center for Medical Genomics, Indiana University School of Medicine, Indianapolis, Indiana
| | - Andy P. Tsai
- Stark Neurosciences Research Institute, Indiana University School of Medicine, Indianapolis, Indiana
| | - Gary E. Landreth
- Stark Neurosciences Research Institute, Indiana University School of Medicine, Indianapolis, Indiana
- Department of Anatomy and Cell Biology, Indiana University School of Medicine, Indianapolis, Indiana
| | - Dongbing Lai
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana
| | - Yue Wang
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana
| | - Tatiana M. Foroud
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana
| | | | - Howard J. Edenberg
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana
- Department of Biochemistry & Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana
| | - Yunlong Liu
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana
- Center for Medical Genomics, Indiana University School of Medicine, Indianapolis, Indiana
| |
Collapse
|
26
|
Nappi F. In-Depth Genomic Analysis: The New Challenge in Congenital Heart Disease. Int J Mol Sci 2024; 25:1734. [PMID: 38339013 PMCID: PMC10855915 DOI: 10.3390/ijms25031734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 01/25/2024] [Accepted: 01/27/2024] [Indexed: 02/12/2024] Open
Abstract
The use of next-generation sequencing has provided new insights into the causes and mechanisms of congenital heart disease (CHD). Examinations of the whole exome sequence have detected detrimental gene variations modifying single or contiguous nucleotides, which are characterised as pathogenic based on statistical assessments of families and correlations with congenital heart disease, elevated expression during heart development, and reductions in harmful protein-coding mutations in the general population. Patients with CHD and extracardiac abnormalities are enriched for gene classes meeting these criteria, supporting a common set of pathways in the organogenesis of CHDs. Single-cell transcriptomics data have revealed the expression of genes associated with CHD in specific cell types, and emerging evidence suggests that genetic mutations disrupt multicellular genes essential for cardiogenesis. Metrics and units are being tracked in whole-genome sequencing studies.
Collapse
Affiliation(s)
- Francesco Nappi
- Department of Cardiac Surgery, Centre Cardiologique du Nord, 93200 Saint-Denis, France
| |
Collapse
|
27
|
Taskiran II, Spanier KI, Dickmänken H, Kempynck N, Pančíková A, Ekşi EC, Hulselmans G, Ismail JN, Theunis K, Vandepoel R, Christiaens V, Mauduit D, Aerts S. Cell-type-directed design of synthetic enhancers. Nature 2024; 626:212-220. [PMID: 38086419 PMCID: PMC10830415 DOI: 10.1038/s41586-023-06936-2] [Citation(s) in RCA: 58] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 12/05/2023] [Indexed: 01/19/2024]
Abstract
Transcriptional enhancers act as docking stations for combinations of transcription factors and thereby regulate spatiotemporal activation of their target genes1. It has been a long-standing goal in the field to decode the regulatory logic of an enhancer and to understand the details of how spatiotemporal gene expression is encoded in an enhancer sequence. Here we show that deep learning models2-6, can be used to efficiently design synthetic, cell-type-specific enhancers, starting from random sequences, and that this optimization process allows detailed tracing of enhancer features at single-nucleotide resolution. We evaluate the function of fully synthetic enhancers to specifically target Kenyon cells or glial cells in the fruit fly brain using transgenic animals. We further exploit enhancer design to create 'dual-code' enhancers that target two cell types and minimal enhancers smaller than 50 base pairs that are fully functional. By examining the state space searches towards local optima, we characterize enhancer codes through the strength, combination and arrangement of transcription factor activator and transcription factor repressor motifs. Finally, we apply the same strategies to successfully design human enhancers, which adhere to enhancer rules similar to those of Drosophila enhancers. Enhancer design guided by deep learning leads to better understanding of how enhancers work and shows that their code can be exploited to manipulate cell states.
Collapse
Affiliation(s)
- Ibrahim I Taskiran
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Katina I Spanier
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Hannah Dickmänken
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Niklas Kempynck
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Alexandra Pančíková
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- VIB-KULeuven Center for Cancer Biology, Leuven, Belgium
| | - Eren Can Ekşi
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Gert Hulselmans
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Joy N Ismail
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- UK Dementia Research Institute at Imperial College London, London, UK
| | - Koen Theunis
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Roel Vandepoel
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Valerie Christiaens
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - David Mauduit
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Stein Aerts
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium.
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium.
- Department of Human Genetics, KU Leuven, Leuven, Belgium.
| |
Collapse
|
28
|
Sun J, Noss S, Banerjee D, Das M, Girirajan S. Strategies for dissecting the complexity of neurodevelopmental disorders. Trends Genet 2024; 40:187-202. [PMID: 37949722 PMCID: PMC10872993 DOI: 10.1016/j.tig.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 09/20/2023] [Accepted: 10/16/2023] [Indexed: 11/12/2023]
Abstract
Neurodevelopmental disorders (NDDs) are associated with a wide range of clinical features, affecting multiple pathways involved in brain development and function. Recent advances in high-throughput sequencing have unveiled numerous genetic variants associated with NDDs, which further contribute to disease complexity and make it challenging to infer disease causation and underlying mechanisms. Herein, we review current strategies for dissecting the complexity of NDDs using model organisms, induced pluripotent stem cells, single-cell sequencing technologies, and massively parallel reporter assays. We further highlight single-cell CRISPR-based screening techniques that allow genomic investigation of cellular transcriptomes with high efficiency, accuracy, and throughput. Overall, we provide an integrated review of experimental approaches that can be applicable for investigating a broad range of complex disorders.
Collapse
Affiliation(s)
- Jiawan Sun
- Molecular, Cellular, and Integrative Biosciences Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA
| | - Serena Noss
- Molecular, Cellular, and Integrative Biosciences Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA
| | - Deepro Banerjee
- Bioinformatics and Genomics Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA
| | - Maitreya Das
- Molecular, Cellular, and Integrative Biosciences Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA
| | - Santhosh Girirajan
- Molecular, Cellular, and Integrative Biosciences Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA; Bioinformatics and Genomics Graduate Program, The Huck Institutes of Life Sciences, University Park, PA 16802, USA; Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA; Department of Anthropology, Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
29
|
Kang CK, Kim AR. Deep molecular learning of transcriptional control of a synthetic CRE enhancer and its variants. iScience 2024; 27:108747. [PMID: 38222110 PMCID: PMC10784702 DOI: 10.1016/j.isci.2023.108747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/29/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024] Open
Abstract
Massively parallel reporter assay measures transcriptional activities of various cis-regulatory modules (CRMs) in a single experiment. We developed a thermodynamic computational model framework that calculates quantitative levels of gene expression directly from regulatory DNA sequences. Using the framework, we investigated the molecular mechanisms of cis-regulatory mutations of a synthetic enhancer that cause abnormal gene expression. We found that, in a human cell line, competitive binding between family transcription factors (TFs) with slightly different binding preferences significantly increases the accuracy of recapitulating the transcriptional effects of thousands of single- or multi-mutations. We also discovered that even if various harmful mutations occurred in an activator binding site, CRM could stably maintain or even increase gene expression through a certain form of competitive binding between family TFs. These findings enhance understanding the effect of SNPs and indels on CRMs and would help building robust custom-designed CRMs for biologics production and gene therapy.
Collapse
Affiliation(s)
- Chan-Koo Kang
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- School of Applied Artificial Intelligence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| |
Collapse
|
30
|
Zheng Y, Yang W, Estepp J, Pei D, Cheng C, Takemoto CM, Inaba H, Jeha S, Pui CH, Relling MV, Karol SE. Genomic analysis of venous thrombosis in children with acute lymphoblastic leukemia from diverse ancestries. Haematologica 2024; 109:53-59. [PMID: 37408475 PMCID: PMC10772501 DOI: 10.3324/haematol.2022.281582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Accepted: 06/29/2023] [Indexed: 07/07/2023] Open
Abstract
Venous thrombosis is a common adverse effect of modern therapy for acute lymphoblastic leukemia (ALL). Prior studies to identify risks of thrombosis in pediatric ALL have been limited by genetic screens of pre-identified genetic variants or genome- wide association studies (GWAS) in ancestrally uniform populations. To address this, we performed a retrospective cohort evaluation of thrombosis risk in 1,005 children treated for newly diagnosed ALL. Genetic risk factors were comprehensively evaluated from genome-wide single nucleotide polymorphism (SNP) arrays and were evaluated using Cox regression adjusting for identified clinical risk factors and genetic ancestry. The cumulative incidence of thrombosis was 7.8%. In multivariate analysis, older age, T-lineage ALL, and non-O blood group were associated with increased thrombosis while non-low-risk treatment and higher presenting white blood cell count trended toward increased thrombosis. No SNP reached genome-wide significance. The SNP most strongly associated with thrombosis was rs2874964 near RFXAP (G risk allele; P=4x10-7; hazard ratio [HR] =2.8). In patients of non-European ancestry, rs55689276 near the α globin cluster (P=1.28x10-6; HR=27) was most strongly associated with thrombosis. Among GWAS catalogue SNP reported to be associated with thrombosis, rs2519093 (T risk allele, P=4.8x10-4; HR=2.1), an intronic variant in ABO, was most strongly associated with risk in this cohort. Classic thrombophilia risks were not associated with thrombosis. Our study confirms known clinical risk features associated with thrombosis risk in children with ALL. In this ancestrally diverse cohort, genetic risks linked to thrombosis risk aggregated in erythrocyte-related SNP, suggesting the critical role of this tissue in thrombosis risk.
Collapse
Affiliation(s)
| | | | - Jeremie Estepp
- Departments of Global Pediatric Medicine; Departments of Hematology
| | | | | | | | - Hiroto Inaba
- Departments of Oncology. St. Jude Children's Research Hospital, Memphis, TN
| | - Sima Jeha
- Departments of Global Pediatric Medicine; Departments of Oncology. St. Jude Children's Research Hospital, Memphis, TN
| | - Ching-Hon Pui
- Departments of Oncology. St. Jude Children's Research Hospital, Memphis, TN
| | | | - Seth E Karol
- Departments of Oncology. St. Jude Children's Research Hospital, Memphis, TN.
| |
Collapse
|
31
|
Jindal GA, Bantle AT, Solvason JJ, Grudzien JL, D'Antonio-Chronowska A, Lim F, Le SH, Song BP, Ragsac MF, Klie A, Larsen RO, Frazer KA, Farley EK. Single-nucleotide variants within heart enhancers increase binding affinity and disrupt heart development. Dev Cell 2023; 58:2206-2216.e5. [PMID: 37848026 PMCID: PMC10720985 DOI: 10.1016/j.devcel.2023.09.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 06/07/2023] [Accepted: 09/20/2023] [Indexed: 10/19/2023]
Abstract
Transcriptional enhancers direct precise gene expression patterns during development and harbor the majority of variants associated with phenotypic diversity, evolutionary adaptations, and disease. Pinpointing which enhancer variants contribute to changes in gene expression and phenotypes is a major challenge. Here, we find that suboptimal or low-affinity binding sites are necessary for precise gene expression during heart development. Single-nucleotide variants (SNVs) can optimize the affinity of ETS binding sites, causing gain-of-function (GOF) gene expression, cell migration defects, and phenotypes as severe as extra beating hearts in the marine chordate Ciona robusta. In human induced pluripotent stem cell (iPSC)-derived cardiomyocytes, a SNV within a human GATA4 enhancer increases ETS binding affinity and causes GOF enhancer activity. The prevalence of suboptimal-affinity sites within enhancers creates a vulnerability whereby affinity-optimizing SNVs can lead to GOF gene expression, changes in cellular identity, and organismal-level phenotypes that could contribute to the evolution of novel traits or diseases.
Collapse
Affiliation(s)
- Granton A Jindal
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA
| | - Alexis T Bantle
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Biological Sciences Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Joe J Solvason
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Jessica L Grudzien
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA
| | | | - Fabian Lim
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Biological Sciences Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Sophia H Le
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA
| | - Benjamin P Song
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Biological Sciences Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Michelle F Ragsac
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Adam Klie
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Reid O Larsen
- Biomedical Sciences Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Kelly A Frazer
- Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA 92093, USA; Institute for Genomic Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA
| | - Emma K Farley
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
32
|
Trauernicht M, Rastogi C, Manzo S, Bussemaker H, van Steensel B. Optimisation of TP53 reporters by systematic dissection of synthetic TP53 response elements. Nucleic Acids Res 2023; 51:9690-9702. [PMID: 37650627 PMCID: PMC10570033 DOI: 10.1093/nar/gkad718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 07/24/2023] [Accepted: 08/22/2023] [Indexed: 09/01/2023] Open
Abstract
TP53 is a transcription factor that controls multiple cellular processes, including cell cycle arrest, DNA repair and apoptosis. The relation between TP53 binding site architecture and transcriptional output is still not fully understood. Here, we systematically examined in three different cell lines the effects of binding site affinity and copy number on TP53-dependent transcriptional output, and also probed the impact of spacer length and sequence between adjacent binding sites, and of core promoter identity. Paradoxically, we found that high-affinity TP53 binding sites are less potent than medium-affinity sites. TP53 achieves supra-additive transcriptional activation through optimally spaced adjacent binding sites, suggesting a cooperative mechanism. Optimally spaced adjacent binding sites have a ∼10-bp periodicity, suggesting a role for spatial orientation along the DNA double helix. We leveraged these insights to construct a log-linear model that explains activity from sequence features, and to identify new highly active and sensitive TP53 reporters.
Collapse
Affiliation(s)
- Max Trauernicht
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Chaitanya Rastogi
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Stefano G Manzo
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Department of Biosciences, University of Milan “La Statale”, 20133 Milan, Italy
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Bas van Steensel
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| |
Collapse
|
33
|
Tan Y, Yan X, Sun J, Wan J, Li X, Huang Y, Li L, Niu L, Hou C. Genome-wide enhancer identification by massively parallel reporter assay in Arabidopsis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 116:234-250. [PMID: 37387536 DOI: 10.1111/tpj.16373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 05/29/2023] [Accepted: 06/27/2023] [Indexed: 07/01/2023]
Abstract
Enhancers are critical cis-regulatory elements controlling gene expression during cell development and differentiation. However, genome-wide enhancer characterization has been challenging due to the lack of a well-defined relationship between enhancers and genes. Function-based methods are the gold standard for determining the biological function of cis-regulatory elements; however, these methods have not been widely applied to plants. Here, we applied a massively parallel reporter assay on Arabidopsis to measure enhancer activities across the genome. We identified 4327 enhancers with various combinations of epigenetic modifications distinctively different from animal enhancers. Furthermore, we showed that enhancers differ from promoters in their preference for transcription factors. Although some enhancers are not conserved and overlap with transposable elements forming clusters, enhancers are generally conserved across thousand Arabidopsis accessions, suggesting they are selected under evolution pressure and could play critical roles in the regulation of important genes. Moreover, comparison analysis reveals that enhancers identified by different strategies do not overlap, suggesting these methods are complementary in nature. In sum, we systematically investigated the features of enhancers identified by functional assay in A. thaliana, which lays the foundation for further investigation into enhancers' functional mechanisms in plants.
Collapse
Affiliation(s)
- Yongjun Tan
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650201, China
- Department of Biology, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Xiaohao Yan
- Department of Biology, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Jialei Sun
- Department of Biology, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Jing Wan
- Department of Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xinxin Li
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Cardiovascular Health and Precision Medicine, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Yingzhang Huang
- Department of Biology, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Li Li
- Department of Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Longjian Niu
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Cardiovascular Health and Precision Medicine, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Chunhui Hou
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650201, China
| |
Collapse
|
34
|
Malfait J, Wan J, Spicuglia S. Epromoters are new players in the regulatory landscape with potential pleiotropic roles. Bioessays 2023; 45:e2300012. [PMID: 37246247 DOI: 10.1002/bies.202300012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 05/11/2023] [Accepted: 05/15/2023] [Indexed: 05/30/2023]
Abstract
Precise spatiotemporal control of gene expression during normal development and cell differentiation is achieved by the combined action of proximal (promoters) and distal (enhancers) cis-regulatory elements. Recent studies have reported that a subset of promoters, termed Epromoters, works also as enhancers to regulate distal genes. This new paradigm opened novel questions regarding the complexity of our genome and raises the possibility that genetic variation within Epromoters has pleiotropic effects on various physiological and pathological traits by differentially impacting multiple proximal and distal genes. Here, we discuss the different observations pointing to an important role of Epromoters in the regulatory landscape and summarize the evidence supporting a pleiotropic impact of these elements in disease. We further hypothesize that Epromoter might represent a major contributor to phenotypic variation and disease.
Collapse
Affiliation(s)
- Juliette Malfait
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, LIGUE, Marseille, France
| | - Jing Wan
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, LIGUE, Marseille, France
| | - Salvatore Spicuglia
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, LIGUE, Marseille, France
| |
Collapse
|
35
|
Liu J, Ashuach T, Inoue F, Ahituv N, Yosef N, Kreimer A. Best practices for perturbation MPRA-a computational evaluation framework of sequence design strategies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.27.559768. [PMID: 37808807 PMCID: PMC10557651 DOI: 10.1101/2023.09.27.559768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
The advent of the perturbation-based massively parallel reporter assays (MPRAs) technique has enabled delineating of the roles of non-coding regulatory elements in orchestrating gene expression. However, computational efforts remain scant to evaluate and establish guidelines for sequence design strategies for perturbation MPRAs. Here, we propose a framework for evaluating and comparing various perturbation strategies for MPRA experiments. Under this framework, we benchmark three different perturbation approaches from the perspectives of alteration in motif-based profiles, consistency of MPRA outputs, and robustness of models that predict the activities of putative regulatory motifs. Although our analyses show similar while significant results in multiple metrics, the method of randomly shuffling nucleotides outperform the other two methods. Thus, we still recommend designing sequences by randomly shuffling the nucleotides of the perturbed site in perturbation-MPRA. The evaluation framework, together with the benchmarking findings in our work, creates a resource of computational pipelines and illustrates the promise of perturbation-MPRA for predicting non-coding regulatory activities.
Collapse
Affiliation(s)
- Jiayi Liu
- Graduate Programs in Molecular Biosciences, Rutgers, The State
University of New Jersey, 604 Allison Rd, Piscataway, NJ, 08854, USA
- Department of Biochemistry and Molecular Biology, Rutgers, The
State University of New Jersey, 604 Allison Road, Piscataway, NJ, 08854, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The
State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, NJ, 08854,
USA
| | - Tal Ashuach
- Department of Electrical Engineering and Computer Sciences and
Center for Computational Biology, University of California, Berkeley, 387 Soda Hall,
Berkeley, CA, 94720, USA
| | - Fumitaka Inoue
- Institute for the Advanced Study of Human Biology (WPI-ASHBi),
Kyoto University, Faculty of Medicine Building B, Yoshidatachibanacho, Sakyo Ward, Kyoto,
606-8303, Japan
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University
of California, San Francisco, 513 Parnassus Ave, CA, 94143, USA
- Institute for Human Genetics, University of California, San
Francisco, 513 Parnassus Ave, CA, 94143, USA
| | - Nir Yosef
- Department of Systems Immunology, Weizmann Institute of Science,
234 Herzl Street, Rehovot 7610001 Israel
- Chan-Zuckerberg Biohub, 499 Illinois St, San Francisco, CA,
94158, USA
- Department of Systems Immunology, Ragon Institute of MGH, MIT,
and Harvard Institute of Science, 400 Technology Square, Cambridge, MA, 02139, USA
| | - Anat Kreimer
- Department of Biochemistry and Molecular Biology, Rutgers, The
State University of New Jersey, 604 Allison Road, Piscataway, NJ, 08854, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The
State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, NJ, 08854,
USA
| |
Collapse
|
36
|
Kleinschmidt H, Xu C, Bai L. Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 2023; 132:167-189. [PMID: 37184694 PMCID: PMC10542970 DOI: 10.1007/s00412-023-00796-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/16/2023]
Abstract
Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.
Collapse
Affiliation(s)
- Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Physics, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
37
|
Hudaiberdiev S, Taylor DL, Song W, Narisu N, Bhuiyan RM, Taylor HJ, Tang X, Yan T, Swift AJ, Bonnycastle LL, Consortium DIAMANTE, Chen S, Stitzel ML, Erdos MR, Ovcharenko I, Collins FS. Modeling islet enhancers using deep learning identifies candidate causal variants at loci associated with T2D and glycemic traits. Proc Natl Acad Sci U S A 2023; 120:e2206612120. [PMID: 37603758 PMCID: PMC10469333 DOI: 10.1073/pnas.2206612120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 07/19/2023] [Indexed: 08/23/2023] Open
Abstract
Genetic association studies have identified hundreds of independent signals associated with type 2 diabetes (T2D) and related traits. Despite these successes, the identification of specific causal variants underlying a genetic association signal remains challenging. In this study, we describe a deep learning (DL) method to analyze the impact of sequence variants on enhancers. Focusing on pancreatic islets, a T2D relevant tissue, we show that our model learns islet-specific transcription factor (TF) regulatory patterns and can be used to prioritize candidate causal variants. At 101 genetic signals associated with T2D and related glycemic traits where multiple variants occur in linkage disequilibrium, our method nominates a single causal variant for each association signal, including three variants previously shown to alter reporter activity in islet-relevant cell types. For another signal associated with blood glucose levels, we biochemically test all candidate causal variants from statistical fine-mapping using a pancreatic islet beta cell line and show biochemical evidence of allelic effects on TF binding for the model-prioritized variant. To aid in future research, we publicly distribute our model and islet enhancer perturbation scores across ~67 million genetic variants. We anticipate that DL methods like the one presented in this study will enhance the prioritization of candidate causal variants for functional studies.
Collapse
Affiliation(s)
- Sanjarbek Hudaiberdiev
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20892
| | - D. Leland Taylor
- Center for Precision Health Research, National Human Genome Research Institute, NIH, Bethesda, MD20892
| | - Wei Song
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20892
| | - Narisu Narisu
- Center for Precision Health Research, National Human Genome Research Institute, NIH, Bethesda, MD20892
| | - Redwan M. Bhuiyan
- The Jackson Laboratory for Genomic Medicine, Farmington, CT06032
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, CT06032
| | - Henry J. Taylor
- Center for Precision Health Research, National Human Genome Research Institute, NIH, Bethesda, MD20892
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, CambridgeCB1 8RN, UK
| | - Xuming Tang
- Department of Surgery, Weill Cornell Medicine, New York, NY10065
- Center for Genomic Health, Weill Cornell Medicine, New York, NY10065
| | - Tingfen Yan
- Center for Precision Health Research, National Human Genome Research Institute, NIH, Bethesda, MD20892
| | - Amy J. Swift
- Center for Precision Health Research, National Human Genome Research Institute, NIH, Bethesda, MD20892
| | - Lori L. Bonnycastle
- Center for Precision Health Research, National Human Genome Research Institute, NIH, Bethesda, MD20892
| | - DIAMANTE Consortium
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20892
- Center for Precision Health Research, National Human Genome Research Institute, NIH, Bethesda, MD20892
- The Jackson Laboratory for Genomic Medicine, Farmington, CT06032
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, CT06032
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, CambridgeCB1 8RN, UK
- Department of Surgery, Weill Cornell Medicine, New York, NY10065
- Center for Genomic Health, Weill Cornell Medicine, New York, NY10065
- Institute of Systems Genomics, University of Connecticut, Farmington, CT06032
| | - Shuibing Chen
- Department of Surgery, Weill Cornell Medicine, New York, NY10065
- Center for Genomic Health, Weill Cornell Medicine, New York, NY10065
| | - Michael L. Stitzel
- The Jackson Laboratory for Genomic Medicine, Farmington, CT06032
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, CT06032
- Institute of Systems Genomics, University of Connecticut, Farmington, CT06032
| | - Michael R. Erdos
- Center for Precision Health Research, National Human Genome Research Institute, NIH, Bethesda, MD20892
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20892
| | - Francis S. Collins
- Center for Precision Health Research, National Human Genome Research Institute, NIH, Bethesda, MD20892
| |
Collapse
|
38
|
Guzman C, Duttke S, Zhu Y, De Arruda Saldanha C, Downes N, Benner C, Heinz S. Combining TSS-MPRA and sensitive TSS profile dissimilarity scoring to study the sequence determinants of transcription initiation. Nucleic Acids Res 2023; 51:e80. [PMID: 37403796 PMCID: PMC10450201 DOI: 10.1093/nar/gkad562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/13/2023] [Accepted: 06/20/2023] [Indexed: 07/06/2023] Open
Abstract
Cis-regulatory elements (CREs) can be classified by the shapes of their transcription start site (TSS) profiles, which are indicative of distinct regulatory mechanisms. Massively parallel reporter assays (MPRAs) are increasingly being used to study CRE regulatory mechanisms, yet the degree to which MPRAs replicate individual endogenous TSS profiles has not been determined. Here, we present a new low-input MPRA protocol (TSS-MPRA) that enables measuring TSS profiles of episomal reporters as well as after lentiviral reporter chromatinization. To sensitively compare MPRA and endogenous TSS profiles, we developed a novel dissimilarity scoring algorithm (WIP score) that outperforms the frequently used earth mover's distance on experimental data. Using TSS-MPRA and WIP scoring on 500 unique reporter inserts, we found that short (153 bp) MPRA promoter inserts replicate the endogenous TSS patterns of ∼60% of promoters. Lentiviral reporter chromatinization did not improve fidelity of TSS-MPRA initiation patterns, and increasing insert size frequently led to activation of extraneous TSS in the MPRA that are not active in vivo. We discuss the implications of our findings, which highlight important caveats when using MPRAs to study transcription mechanisms. Finally, we illustrate how TSS-MPRA and WIP scoring can provide novel insights into the impact of transcription factor motif mutations and genetic variants on TSS patterns and transcription levels.
Collapse
Affiliation(s)
- Carlos Guzman
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
- Department of Bioengineering, Graduate Program in Bioinformatics & Systems Biology, U.C. San Diego, La Jolla, CA 92093, USA
| | - Sascha Duttke
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Yixin Zhu
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Camila De Arruda Saldanha
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Nicholas L Downes
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Christopher Benner
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Sven Heinz
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| |
Collapse
|
39
|
Armendariz DA, Sundarrajan A, Hon GC. Breaking enhancers to gain insights into developmental defects. eLife 2023; 12:e88187. [PMID: 37497775 PMCID: PMC10374278 DOI: 10.7554/elife.88187] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 07/19/2023] [Indexed: 07/28/2023] Open
Abstract
Despite ground-breaking genetic studies that have identified thousands of risk variants for developmental diseases, how these variants lead to molecular and cellular phenotypes remains a gap in knowledge. Many of these variants are non-coding and occur at enhancers, which orchestrate key regulatory programs during development. The prevailing paradigm is that non-coding variants alter the activity of enhancers, impacting gene expression programs, and ultimately contributing to disease risk. A key obstacle to progress is the systematic functional characterization of non-coding variants at scale, especially since enhancer activity is highly specific to cell type and developmental stage. Here, we review the foundational studies of enhancers in developmental disease and current genomic approaches to functionally characterize developmental enhancers and their variants at scale. In the coming decade, we anticipate systematic enhancer perturbation studies to link non-coding variants to molecular mechanisms, changes in cell state, and disease phenotypes.
Collapse
Affiliation(s)
- Daniel A Armendariz
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
| | - Anjana Sundarrajan
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
| | - Gary C Hon
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
- Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, United States
- Lyda Hill Department of Bioinformatics, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, United States
| |
Collapse
|
40
|
Dincer TU, Ernst J. Integrative epigenomic and functional characterization assay based annotation of regulatory activity across diverse human cell types. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.14.549056. [PMID: 37503240 PMCID: PMC10369970 DOI: 10.1101/2023.07.14.549056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
We introduce ChromActivity, a computational framework for predicting and annotating regulatory activity across the genome through integration of multiple epigenomic maps and various functional characterization datasets. ChromActivity generates genomewide predictions of regulatory activity associated with each functional characterization dataset across many cell types based on available epigenomic data. It then for each cell type produces (1) ChromScoreHMM genome annotations based on the combinatorial and spatial patterns within these predictions and (2) ChromScore tracks of overall predicted regulatory activity. ChromActivity provides a resource for analyzing and interpreting the human regulatory genome across diverse cell types.
Collapse
Affiliation(s)
- Tevfik Umut Dincer
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, 90095, USA
- Department of Biological Chemistry, University of California, Los Angeles, CA, 90095, USA
| | - Jason Ernst
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, 90095, USA
- Department of Biological Chemistry, University of California, Los Angeles, CA, 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at University of California, Los Angeles, CA, 90095, USA
- Computer Science Department, University of California, Los Angeles, CA, 90095, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, CA, 90095, USA
| |
Collapse
|
41
|
Pal D, Patel M, Boulet F, Sundarraj J, Grant OA, Branco MR, Basu S, Santos SDM, Zabet NR, Scaffidi P, Pradeepa MM. H4K16ac activates the transcription of transposable elements and contributes to their cis-regulatory function. Nat Struct Mol Biol 2023; 30:935-947. [PMID: 37308596 PMCID: PMC10352135 DOI: 10.1038/s41594-023-01016-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 05/05/2023] [Indexed: 06/14/2023]
Abstract
Mammalian genomes harbor abundant transposable elements (TEs) and their remnants, with numerous epigenetic repression mechanisms enacted to silence TE transcription. However, TEs are upregulated during early development, neuronal lineage, and cancers, although the epigenetic factors contributing to the transcription of TEs have yet to be fully elucidated. Here, we demonstrate that the male-specific lethal (MSL)-complex-mediated histone H4 acetylation at lysine 16 (H4K16ac) is enriched at TEs in human embryonic stem cells (hESCs) and cancer cells. This in turn activates transcription of subsets of full-length long interspersed nuclear elements (LINE1s, L1s) and endogenous retrovirus (ERV) long terminal repeats (LTRs). Furthermore, we show that the H4K16ac-marked L1 and LTR subfamilies display enhancer-like functions and are enriched in genomic locations with chromatin features associated with active enhancers. Importantly, such regions often reside at boundaries of topologically associated domains and loop with genes. CRISPR-based epigenetic perturbation and genetic deletion of L1s reveal that H4K16ac-marked L1s and LTRs regulate the expression of genes in cis. Overall, TEs enriched with H4K16ac contribute to the cis-regulatory landscape at specific genomic locations by maintaining an active chromatin landscape at TEs.
Collapse
Affiliation(s)
- Debosree Pal
- Blizard Institute, Faculty of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Manthan Patel
- Blizard Institute, Faculty of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Fanny Boulet
- Blizard Institute, Faculty of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Jayakumar Sundarraj
- Blizard Institute, Faculty of Medicine and Dentistry, Queen Mary University of London, London, UK
- Bhabha Atomic Research Centre, Mumbai, India
| | - Olivia A Grant
- Blizard Institute, Faculty of Medicine and Dentistry, Queen Mary University of London, London, UK
- School of Life Sciences, University of Essex, Colchester, UK
| | - Miguel R Branco
- Blizard Institute, Faculty of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Srinjan Basu
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
| | | | - Nicolae Radu Zabet
- Blizard Institute, Faculty of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Paola Scaffidi
- Francis Crick Institute, London, UK
- Department of Experimental Oncology, European Institute of Oncology, Milan, Italy
| | - Madapura M Pradeepa
- Blizard Institute, Faculty of Medicine and Dentistry, Queen Mary University of London, London, UK.
| |
Collapse
|
42
|
Chen Z, Javed N, Moore M, Wu J, Sun G, Vinyard M, Collins A, Pinello L, Najm FJ, Bernstein BE. Integrative dissection of gene regulatory elements at base resolution. CELL GENOMICS 2023; 3:100318. [PMID: 37388913 PMCID: PMC10300548 DOI: 10.1016/j.xgen.2023.100318] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 02/21/2023] [Accepted: 03/31/2023] [Indexed: 07/01/2023]
Abstract
Although vast numbers of putative gene regulatory elements have been cataloged, the sequence motifs and individual bases that underlie their functions remain largely unknown. Here, we combine epigenetic perturbations, base editing, and deep learning to dissect regulatory sequences within the exemplar immune locus encoding CD69. We converge on a ∼170 base interval within a differentially accessible and acetylated enhancer critical for CD69 induction in stimulated Jurkat T cells. Individual C-to-T base edits within the interval markedly reduce element accessibility and acetylation, with corresponding reduction of CD69 expression. The most potent base edits may be explained by their effect on regulatory interactions between the transcriptional activators GATA3 and TAL1 and the repressor BHLHE40. Systematic analysis suggests that the interplay between GATA3 and BHLHE40 plays a general role in rapid T cell transcriptional responses. Our study provides a framework for parsing regulatory elements in their endogenous chromatin contexts and identifying operative artificial variants.
Collapse
Affiliation(s)
- Zeyu Chen
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
- Department of Cell Biology and Pathology, Harvard Medical School, Boston, MA, USA
| | - Nauman Javed
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
- Department of Cell Biology and Pathology, Harvard Medical School, Boston, MA, USA
| | - Molly Moore
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
| | - Jingyi Wu
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
- Department of Cell Biology and Pathology, Harvard Medical School, Boston, MA, USA
| | - Gary Sun
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Cell Biology and Pathology, Harvard Medical School, Boston, MA, USA
| | - Michael Vinyard
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
| | | | - Luca Pinello
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Fadi J. Najm
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
| | - Bradley E. Bernstein
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
- Department of Cell Biology and Pathology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
43
|
Boyd RJ, McClymont SA, Barrientos NB, Hook PW, Law WD, Rose RJ, Waite EL, Rathinavelu J, Avramopoulos D, McCallion AS. Evaluating the mouse neural precursor line, SN4741, as a suitable proxy for midbrain dopaminergic neurons. BMC Genomics 2023; 24:306. [PMID: 37286935 PMCID: PMC10245633 DOI: 10.1186/s12864-023-09398-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 05/23/2023] [Indexed: 06/09/2023] Open
Abstract
To overcome the ethical and technical limitations of in vivo human disease models, the broader scientific community frequently employs model organism-derived cell lines to investigate disease mechanisms, pathways, and therapeutic strategies. Despite the widespread use of certain in vitro models, many still lack contemporary genomic analysis supporting their use as a proxy for the affected human cells and tissues. Consequently, it is imperative to determine how accurately and effectively any proposed biological surrogate may reflect the biological processes it is assumed to model. One such cellular surrogate of human disease is the established mouse neural precursor cell line, SN4741, which has been used to elucidate mechanisms of neurotoxicity in Parkinson disease for over 25 years. Here, we are using a combination of classic and contemporary genomic techniques - karyotyping, RT-qPCR, single cell RNA-seq, bulk RNA-seq, and ATAC-seq - to characterize the transcriptional landscape, chromatin landscape, and genomic architecture of this cell line, and evaluate its suitability as a proxy for midbrain dopaminergic neurons in the study of Parkinson disease. We find that SN4741 cells possess an unstable triploidy and consistently exhibits low expression of dopaminergic neuron markers across assays, even when the cell line is shifted to the non-permissive temperature that drives differentiation. The transcriptional signatures of SN4741 cells suggest that they are maintained in an undifferentiated state at the permissive temperature and differentiate into immature neurons at the non-permissive temperature; however, they may not be dopaminergic neuron precursors, as previously suggested. Additionally, the chromatin landscapes of SN4741 cells, in both the differentiated and undifferentiated states, are not concordant with the open chromatin profiles of ex vivo, mouse E15.5 forebrain- or midbrain-derived dopaminergic neurons. Overall, our data suggest that SN4741 cells may reflect early aspects of neuronal differentiation but are likely not a suitable proxy for dopaminergic neurons as previously thought. The implications of this study extend broadly, illuminating the need for robust biological and genomic rationale underpinning the use of in vitro models of molecular processes.
Collapse
Affiliation(s)
- Rachel J. Boyd
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
| | - Sarah A. McClymont
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
| | - Nelson B. Barrientos
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
| | - Paul W. Hook
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
| | - William D. Law
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
| | - Rebecca J. Rose
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
| | - Eric L. Waite
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
| | - Jay Rathinavelu
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
| | - Dimitrios Avramopoulos
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
| | - Andrew S. McCallion
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287 USA
| |
Collapse
|
44
|
Rong S, Neil CR, Welch A, Duan C, Maguire S, Meremikwu IC, Meyerson M, Evans BJ, Fairbrother WG. Large-scale functional screen identifies genetic variants with splicing effects in modern and archaic humans. Proc Natl Acad Sci U S A 2023; 120:e2218308120. [PMID: 37192163 PMCID: PMC10214146 DOI: 10.1073/pnas.2218308120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 04/12/2023] [Indexed: 05/18/2023] Open
Abstract
Humans coexisted and interbred with other hominins which later became extinct. These archaic hominins are known to us only through fossil records and for two cases, genome sequences. Here, we engineer Neanderthal and Denisovan sequences into thousands of artificial genes to reconstruct the pre-mRNA processing patterns of these extinct populations. Of the 5,169 alleles tested in this massively parallel splicing reporter assay (MaPSy), we report 962 exonic splicing mutations that correspond to differences in exon recognition between extant and extinct hominins. Using MaPSy splicing variants, predicted splicing variants, and splicing quantitative trait loci, we show that splice-disrupting variants experienced greater purifying selection in anatomically modern humans than that in Neanderthals. Adaptively introgressed variants were enriched for moderate-effect splicing variants, consistent with positive selection for alternative spliced alleles following introgression. As particularly compelling examples, we characterized a unique tissue-specific alternative splicing variant at the adaptively introgressed innate immunity gene TLR1, as well as a unique Neanderthal introgressed alternative splicing variant in the gene HSPG2 that encodes perlecan. We further identified potentially pathogenic splicing variants found only in Neanderthals and Denisovans in genes related to sperm maturation and immunity. Finally, we found splicing variants that may contribute to variation among modern humans in total bilirubin, balding, hemoglobin levels, and lung capacity. Our findings provide unique insights into natural selection acting on splicing in human evolution and demonstrate how functional assays can be used to identify candidate causal variants underlying differences in gene regulation and phenotype.
Collapse
Affiliation(s)
- Stephen Rong
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Christopher R. Neil
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Anastasia Welch
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Chaorui Duan
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Samantha Maguire
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ijeoma C. Meremikwu
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Malcolm Meyerson
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ben J. Evans
- Department of Biology, McMaster University, Hamilton, ONL8S 4K1, Canada
| | - William G. Fairbrother
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
- Hassenfeld Child Health Innovation Institute of Brown University, Providence, RI02912
| |
Collapse
|
45
|
Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023; 24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Enhancers are genomic DNA elements controlling spatiotemporal gene expression. Their flexible organization and functional redundancies make deciphering their sequence-function relationships challenging. This article provides an overview of the current understanding of enhancer organization and evolution, with an emphasis on factors that influence these relationships. Technological advancements, particularly in machine learning and synthetic biology, are discussed in light of how they provide new ways to understand this complexity. Exciting opportunities lie ahead as we continue to unravel the intricacies of enhancer function.
Collapse
Affiliation(s)
- Gabrielle D Smith
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Wan Hern Ching
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
| | - Paola Cornejo-Páramo
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Emily S Wong
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia.
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia.
| |
Collapse
|
46
|
Ren N, Dai S, Ma S, Yang F. Strategies for activity analysis of single nucleotide polymorphisms associated with human diseases. Clin Genet 2023; 103:392-400. [PMID: 36527336 DOI: 10.1111/cge.14282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/10/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
Genome-wide association studies (GWAS) have identified a large number of single nucleotide polymorphism (SNP) sites associated with human diseases. In the annotation of human diseases, especially cancers, SNPs, as an important component of genetic factors, have gained increasing attention. Given that most of the SNPs are located in non-coding regions, the functional verification of these SNPs is a great challenge. The key to functional annotation for risk SNPs is to screen SNPs with regulatory activity from thousands of disease associated-SNPs. In this review, we systematically recapitulate the characteristics and functional roles of SNP sites, discuss three parallel reporter screening strategies in detail based on barcode tag classification, and recommend the common in silico strategies to help supplement the annotation of SNP sites with epigenetic activity analysis, prediction of target genes and trans-acting factors. We hope that this review will contribute to this exuberant research field by providing robust activity analysis strategies that can facilitate the translation of GWAS results into personalized diagnosis and prevention measures for human diseases.
Collapse
Affiliation(s)
- Naixia Ren
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shangkun Dai
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shumin Ma
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
| | - Fengtang Yang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| |
Collapse
|
47
|
Zheng Y, VanDusen NJ. Massively Parallel Reporter Assays for High-Throughput In Vivo Analysis of Cis-Regulatory Elements. J Cardiovasc Dev Dis 2023; 10:jcdd10040144. [PMID: 37103023 PMCID: PMC10146671 DOI: 10.3390/jcdd10040144] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/24/2023] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
The rapid improvement of descriptive genomic technologies has fueled a dramatic increase in hypothesized connections between cardiovascular gene expression and phenotypes. However, in vivo testing of these hypotheses has predominantly been relegated to slow, expensive, and linear generation of genetically modified mice. In the study of genomic cis-regulatory elements, generation of mice featuring transgenic reporters or cis-regulatory element knockout remains the standard approach. While the data obtained is of high quality, the approach is insufficient to keep pace with candidate identification and therefore results in biases introduced during the selection of candidates for validation. However, recent advances across a range of disciplines are converging to enable functional genomic assays that can be conducted in a high-throughput manner. Here, we review one such method, massively parallel reporter assays (MPRAs), in which the activities of thousands of candidate genomic regulatory elements are simultaneously assessed via the next-generation sequencing of a barcoded reporter transcript. We discuss best practices for MPRA design and use, with a focus on practical considerations, and review how this emerging technology has been successfully deployed in vivo. Finally, we discuss how MPRAs are likely to evolve and be used in future cardiovascular research.
Collapse
|
48
|
Stankey CT, Lee JC. Translating non-coding genetic associations into a better understanding of immune-mediated disease. Dis Model Mech 2023; 16:dmm049790. [PMID: 36897113 PMCID: PMC10040244 DOI: 10.1242/dmm.049790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023] Open
Abstract
Genome-wide association studies have identified hundreds of genetic loci that are associated with immune-mediated diseases. Most disease-associated variants are non-coding, and a large proportion of these variants lie within enhancers. As a result, there is a pressing need to understand how common genetic variation might affect enhancer function and thereby contribute to immune-mediated (and other) diseases. In this Review, we first describe statistical and experimental methods to identify causal genetic variants that modulate gene expression, including statistical fine-mapping and massively parallel reporter assays. We then discuss approaches to characterise the mechanisms by which these variants modulate immune function, such as clustered regularly interspaced short palindromic repeats (CRISPR)-based screens. We highlight examples of studies that, by elucidating the effects of disease variants within enhancers, have provided important insights into immune function and uncovered key pathways of disease.
Collapse
Affiliation(s)
- Christina T. Stankey
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London NW1 1AT, UK
- Department of Immunology and Inflammation, Imperial College London, London W12 0NN, UK
| | - James C. Lee
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London NW1 1AT, UK
- Institute of Liver and Digestive Health, Royal Free Hospital, University College London, London NW3 2PF, UK
| |
Collapse
|
49
|
Boyd RJ, McClymont SA, Barrientos NB, Hook PW, Law WD, Rose RJ, Waite EL, Rathinavelu J, Avramopoulos D, McCallion AS. Evaluating the mouse neural precursor line, SN4741, as a suitable proxy for midbrain dopaminergic neurons. RESEARCH SQUARE 2023:rs.3.rs-2520557. [PMID: 36824793 PMCID: PMC9949168 DOI: 10.21203/rs.3.rs-2520557/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
To overcome the ethical and technical limitations of in vivo human disease models, the broader scientific community frequently employs model organism-derived cell lines to investigate of disease mechanisms, pathways, and therapeutic strategies. Despite the widespread use of certain in vitro models, many still lack contemporary genomic analysis supporting their use as a proxy for the affected human cells and tissues. Consequently, it is imperative to determine how accurately and effectively any proposed biological surrogate may reflect the biological processes it is assumed to model. One such cellular surrogate of human disease is the established mouse neural precursor cell line, SN4741, which has been used to elucidate mechanisms of neurotoxicity in Parkinson disease for over 25 years. Here, we are using a combination of classic and contemporary genomic techniques - karyotyping, RT-qPCR, single cell RNA-seq, bulk RNA-seq, and ATAC-seq - to characterize the transcriptional landscape, chromatin landscape, and genomic architecture of this cell line, and evaluate its suitability as a proxy for midbrain dopaminergic neurons in the study of Parkinson disease. We find that SN4741 cells possess an unstable triploidy and consistently exhibits low expression of dopaminergic neuron markers across assays, even when the cell line is shifted to the non-permissive temperature that drives differentiation. The transcriptional signatures of SN4741 cells suggest that they are maintained in an undifferentiated state at the permissive temperature and differentiate into immature neurons at the non-permissive temperature; however, they may not be dopaminergic neuron precursors, as previously suggested. Additionally, the chromatin landscapes of SN4741 cells, in both the differentiated and undifferentiated states, are not concordant with the open chromatin profiles of ex vivo , mouse E15.5 forebrain- or midbrain-derived dopaminergic neurons. Overall, our data suggest that SN4741 cells may reflect early aspects of neuronal differentiation but are likely not a suitable a proxy for dopaminergic neurons as previously thought. The implications of this study extend broadly, illuminating the need for robust biological and genomic rationale underpinning the use of in vitro models of molecular processes.
Collapse
|
50
|
Gallego Romero I, Lea AJ. Leveraging massively parallel reporter assays for evolutionary questions. Genome Biol 2023; 24:26. [PMID: 36788564 PMCID: PMC9926830 DOI: 10.1186/s13059-023-02856-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 01/17/2023] [Indexed: 02/16/2023] Open
Abstract
A long-standing goal of evolutionary biology is to decode how gene regulation contributes to organismal diversity. Doing so is challenging because it is hard to predict function from non-coding sequence and to perform molecular research with non-model taxa. Massively parallel reporter assays (MPRAs) enable the testing of thousands to millions of sequences for regulatory activity simultaneously. Here, we discuss the execution, advantages, and limitations of MPRAs, with a focus on evolutionary questions. We propose solutions for extending MPRAs to rare taxa and those with limited genomic resources, and we underscore MPRA's broad potential for driving genome-scale, functional studies across organisms.
Collapse
Affiliation(s)
- Irene Gallego Romero
- Melbourne Integrative Genomics, University of Melbourne, Royal Parade, Parkville, Victoria, 3010, Australia. .,School of BioSciences, The University of Melbourne, Royal Parade, Parkville, 3010, Australia. .,The Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, 30 Royal Parade, Parkville, Victoria, 3010, Australia. .,Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Estonia.
| | - Amanda J. Lea
- grid.152326.10000 0001 2264 7217Department of Biological Sciences, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37240 USA ,Child and Brain Development Program, Canadian Institute for Advanced Study, Toronto, Canada
| |
Collapse
|