151
|
Ni P, Wu S, Su Z. Underlying causes for prevalent false positives and false negatives in STARR-seq data. NAR Genom Bioinform 2023; 5:lqad085. [PMID: 37745976 PMCID: PMC10516709 DOI: 10.1093/nargab/lqad085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 08/23/2023] [Accepted: 09/12/2023] [Indexed: 09/26/2023] Open
Abstract
Self-transcribing active regulatory region sequencing (STARR-seq) and its variants have been widely used to characterize enhancers. However, it has been reported that up to 87% of STARR-seq peaks are located in repressive chromatin and are not functional in the tested cells. While some of the STARR-seq peaks in repressive chromatin might be active in other cell/tissue types, some others might be false positives. Meanwhile, many active enhancers may not be identified by the current STARR-seq methods. Although methods have been proposed to mitigate systematic errors caused by the use of plasmid vectors, the artifacts due to the intrinsic limitations of current STARR-seq methods are still prevalent and the underlying causes are not fully understood. Based on predicted cis-regulatory modules (CRMs) and non-CRMs in the human genome as well as predicted active CRMs and non-active CRMs in a few human cell lines/tissues with STARR-seq data available, we reveal prevalent false positives and false negatives in STARR-seq peaks generated by major variants of STARR-seq methods and possible underlying causes. Our results will help design strategies to improve STARR-seq methods and interpret the results.
Collapse
Affiliation(s)
- Pengyu Ni
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Siwen Wu
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
152
|
Kleinschmidt H, Xu C, Bai L. Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 2023; 132:167-189. [PMID: 37184694 PMCID: PMC10542970 DOI: 10.1007/s00412-023-00796-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/16/2023]
Abstract
Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.
Collapse
Affiliation(s)
- Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Physics, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
153
|
Guzman C, Duttke S, Zhu Y, De Arruda Saldanha C, Downes N, Benner C, Heinz S. Combining TSS-MPRA and sensitive TSS profile dissimilarity scoring to study the sequence determinants of transcription initiation. Nucleic Acids Res 2023; 51:e80. [PMID: 37403796 PMCID: PMC10450201 DOI: 10.1093/nar/gkad562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/13/2023] [Accepted: 06/20/2023] [Indexed: 07/06/2023] Open
Abstract
Cis-regulatory elements (CREs) can be classified by the shapes of their transcription start site (TSS) profiles, which are indicative of distinct regulatory mechanisms. Massively parallel reporter assays (MPRAs) are increasingly being used to study CRE regulatory mechanisms, yet the degree to which MPRAs replicate individual endogenous TSS profiles has not been determined. Here, we present a new low-input MPRA protocol (TSS-MPRA) that enables measuring TSS profiles of episomal reporters as well as after lentiviral reporter chromatinization. To sensitively compare MPRA and endogenous TSS profiles, we developed a novel dissimilarity scoring algorithm (WIP score) that outperforms the frequently used earth mover's distance on experimental data. Using TSS-MPRA and WIP scoring on 500 unique reporter inserts, we found that short (153 bp) MPRA promoter inserts replicate the endogenous TSS patterns of ∼60% of promoters. Lentiviral reporter chromatinization did not improve fidelity of TSS-MPRA initiation patterns, and increasing insert size frequently led to activation of extraneous TSS in the MPRA that are not active in vivo. We discuss the implications of our findings, which highlight important caveats when using MPRAs to study transcription mechanisms. Finally, we illustrate how TSS-MPRA and WIP scoring can provide novel insights into the impact of transcription factor motif mutations and genetic variants on TSS patterns and transcription levels.
Collapse
Affiliation(s)
- Carlos Guzman
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
- Department of Bioengineering, Graduate Program in Bioinformatics & Systems Biology, U.C. San Diego, La Jolla, CA 92093, USA
| | - Sascha Duttke
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Yixin Zhu
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Camila De Arruda Saldanha
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Nicholas L Downes
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Christopher Benner
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Sven Heinz
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| |
Collapse
|
154
|
Capauto D, Wang Y, Wu F, Norton S, Mariani J, Inoue F, Crawford GE, Ahituv N, Abyzov A, Vaccarino FM. Characterization of enhancer activity in early human neurodevelopment using Massively parallel reporter assay (MPRA) and forebrain organoids. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.14.553170. [PMID: 37645832 PMCID: PMC10461976 DOI: 10.1101/2023.08.14.553170] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Regulation of gene expression through enhancers is one of the major processes shaping the structure and function of the human brain during development. High-throughput assays have predicted thousands of enhancers involved in neurodevelopment, and confirming their activity through orthogonal functional assays is crucial. Here, we utilized Massively Parallel Reporter Assays (MPRAs) in stem cells and forebrain organoids to evaluate the activity of ~7,000 gene-linked enhancers previously identified in human fetal tissues and brain organoids. We used a Gaussian mixture model to evaluate the contribution of background noise in the measured activity signal to confirm the activity of ~35% of the tested enhancers, with most showing temporal-specific activity, suggesting their evolving role in neurodevelopment. The temporal specificity was further supported by the correlation of activity with gene expression. Our findings provide a valuable gene regulatory resource to the scientific community.
Collapse
Affiliation(s)
- Davide Capauto
- Child Study Center, Yale University, New Haven, CT 06520
| | - Yifan Wang
- Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Feinan Wu
- Child Study Center, Yale University, New Haven, CT 06520
| | - Scott Norton
- Child Study Center, Yale University, New Haven, CT 06520
| | | | - Fumitaka Inoue
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University; Kyoto, Japan
| | | | | | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco; San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco; San Francisco, CA, USA
| | - Alexej Abyzov
- Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Flora M. Vaccarino
- Child Study Center, Yale University, New Haven, CT 06520
- Department of Neuroscience, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
155
|
Monti R, Ohler U. Toward Identification of Functional Sequences and Variants in Noncoding DNA. Annu Rev Biomed Data Sci 2023; 6:191-210. [PMID: 37262323 DOI: 10.1146/annurev-biodatasci-122120-110102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Understanding the noncoding part of the genome, which encodes gene regulation, is necessary to identify genetic mechanisms of disease and translate findings from genome-wide association studies into actionable results for treatments and personalized care. Here we provide an overview of the computational analysis of noncoding regions, starting from gene-regulatory mechanisms and their representation in data. Deep learning methods, when applied to these data, highlight important regulatory sequence elements and predict the functional effects of genetic variants. These and other algorithms are used to predict damaging sequence variants. Finally, we introduce rare-variant association tests that incorporate functional annotations and predictions in order to increase interpretability and statistical power.
Collapse
Affiliation(s)
- Remo Monti
- Max Delbrück Center for Molecular Medicine (MDC), Helmholtz Association of German Research Centers, Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany;
- Digital Health-Machine Learning, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Uwe Ohler
- Max Delbrück Center for Molecular Medicine (MDC), Helmholtz Association of German Research Centers, Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany;
| |
Collapse
|
156
|
Duan YY, Chen XF, Zhu RJ, Jia YY, Huang XT, Zhang M, Yang N, Dong SS, Zeng M, Feng Z, Zhu DL, Wu H, Jiang F, Shi W, Hu WX, Ke X, Chen H, Liu Y, Jing RH, Guo Y, Li M, Yang TL. High-throughput functional dissection of noncoding SNPs with biased allelic enhancer activity for insulin resistance-relevant phenotypes. Am J Hum Genet 2023; 110:1266-1288. [PMID: 37506691 PMCID: PMC10432149 DOI: 10.1016/j.ajhg.2023.07.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 07/04/2023] [Accepted: 07/05/2023] [Indexed: 07/30/2023] Open
Abstract
Most of the single-nucleotide polymorphisms (SNPs) associated with insulin resistance (IR)-relevant phenotypes by genome-wide association studies (GWASs) are located in noncoding regions, complicating their functional interpretation. Here, we utilized an adapted STARR-seq to evaluate the regulatory activities of 5,987 noncoding SNPs associated with IR-relevant phenotypes. We identified 876 SNPs with biased allelic enhancer activity effects (baaSNPs) across 133 loci in three IR-relevant cell lines (HepG2, preadipocyte, and A673), which showed pervasive cell specificity and significant enrichment for cell-specific open chromatin regions or enhancer-indicative markers (H3K4me1, H3K27ac). Further functional characterization suggested several transcription factors (TFs) with preferential allelic binding to baaSNPs. We also incorporated multi-omics data to prioritize 102 candidate regulatory target genes for baaSNPs and revealed prevalent long-range regulatory effects and cell-specific IR-relevant biological functional enrichment on them. Specifically, we experimentally verified the distal regulatory mechanism at IRS1 locus, in which rs952227-A reinforces IRS1 expression by long-range chromatin interaction and preferential binding to the transcription factor HOXC6 to augment the enhancer activity. Finally, based on our STARR-seq screening data, we predicted the enhancer activity of 227,343 noncoding SNPs associated with IR-relevant phenotypes (fasting insulin adjusted for BMI, HDL cholesterol, and triglycerides) from the largest available GWAS summary statistics. We further provided an open resource (http://www.bigc.online/fnSNP-IR) for better understanding genetic regulatory mechanisms of IR-relevant phenotypes.
Collapse
Affiliation(s)
- Yuan-Yuan Duan
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Xiao-Feng Chen
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Ren-Jie Zhu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Ying-Ying Jia
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Xiao-Ting Huang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Meng Zhang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Ning Yang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Shan-Shan Dong
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Mengqi Zeng
- Frontier Institute of Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Zhihui Feng
- Frontier Institute of Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Dong-Li Zhu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Hao Wu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Feng Jiang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Wei Shi
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Wei-Xin Hu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Xin Ke
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Hao Chen
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
| | - Rui-Hua Jing
- Department of Ophthalmology, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi 710000, China
| | - Yan Guo
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | - Meng Li
- Department of Orthopedics, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi 710061, China.
| | - Tie-Lin Yang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China; Department of Orthopedics, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi 710061, China.
| |
Collapse
|
157
|
Nowling RJ, Njoya K, Peters JG, Riehle MM. Prediction accuracy of regulatory elements from sequence varies by functional sequencing technique. Front Cell Infect Microbiol 2023; 13:1182567. [PMID: 37600946 PMCID: PMC10433755 DOI: 10.3389/fcimb.2023.1182567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 07/10/2023] [Indexed: 08/22/2023] Open
Abstract
Introduction Various sequencing based approaches are used to identify and characterize the activities of cis-regulatory elements in a genome-wide fashion. Some of these techniques rely on indirect markers such as histone modifications (ChIP-seq with histone antibodies) or chromatin accessibility (ATAC-seq, DNase-seq, FAIRE-seq), while other techniques use direct measures such as episomal assays measuring the enhancer properties of DNA sequences (STARR-seq) and direct measurement of the binding of transcription factors (ChIP-seq with transcription factor-specific antibodies). The activities of cis-regulatory elements such as enhancers, promoters, and repressors are determined by their sequence and secondary processes such as chromatin accessibility, DNA methylation, and bound histone markers. Methods Here, machine learning models are employed to evaluate the accuracy with which cis-regulatory elements identified by various commonly used sequencing techniques can be predicted by their underlying sequence alone to distinguish between cis-regulatory activity that is reflective of sequence content versus secondary processes. Results and discussion Models trained and evaluated on D. melanogaster sequences identified through DNase-seq and STARR-seq are significantly more accurate than models trained on sequences identified by H3K4me1, H3K4me3, and H3K27ac ChIP-seq, FAIRE-seq, and ATAC-seq. These results suggest that the activity detected by DNase-seq and STARR-seq can be largely explained by underlying DNA sequence, independent of secondary processes. Experimentally, a subset of DNase-seq and H3K4me1 ChIP-seq sequences were tested for enhancer activity using luciferase assays and compared with previous tests performed on STARR-seq sequences. The experimental data indicated that STARR-seq sequences are substantially enriched for enhancer-specific activity, while the DNase-seq and H3K4me1 ChIP-seq sequences are not. Taken together, these results indicate that the DNase-seq approach identifies a broad class of regulatory elements of which enhancers are a subset and the associated data are appropriate for training models for detecting regulatory activity from sequence alone, STARR-seq data are best for training enhancer-specific sequence models, and H3K4me1 ChIP-seq data are not well suited for training and evaluating sequence-based models for cis-regulatory element prediction.
Collapse
Affiliation(s)
- Ronald J. Nowling
- Electrical Engineering and Computer Science, Milwaukee School of Engineering, Milwaukee, WI, United States
| | - Kimani Njoya
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| | - John G. Peters
- Electrical Engineering and Computer Science, Milwaukee School of Engineering, Milwaukee, WI, United States
| | - Michelle M. Riehle
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| |
Collapse
|
158
|
Fan K, Pfister E, Weng Z. Toward a comprehensive catalog of regulatory elements. Hum Genet 2023; 142:1091-1111. [PMID: 36935423 DOI: 10.1007/s00439-023-02519-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 01/03/2023] [Indexed: 03/21/2023]
Abstract
Regulatory elements are the genomic regions that interact with transcription factors to control cell-type-specific gene expression in different cellular environments. A precise and complete catalog of functional elements encoded by the human genome is key to understanding mammalian gene regulation. Here, we review the current state of regulatory element annotation. We first provide an overview of assays for characterizing functional elements, including genome, epigenome, transcriptome, three-dimensional chromatin interaction, and functional validation assays. We then discuss computational methods for defining regulatory elements, including peak-calling and other statistical modeling methods. Finally, we introduce several high-quality lists of regulatory element annotations and suggest potential future directions.
Collapse
Affiliation(s)
- Kaili Fan
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Edith Pfister
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA.
| |
Collapse
|
159
|
Armendariz DA, Sundarrajan A, Hon GC. Breaking enhancers to gain insights into developmental defects. eLife 2023; 12:e88187. [PMID: 37497775 PMCID: PMC10374278 DOI: 10.7554/elife.88187] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 07/19/2023] [Indexed: 07/28/2023] Open
Abstract
Despite ground-breaking genetic studies that have identified thousands of risk variants for developmental diseases, how these variants lead to molecular and cellular phenotypes remains a gap in knowledge. Many of these variants are non-coding and occur at enhancers, which orchestrate key regulatory programs during development. The prevailing paradigm is that non-coding variants alter the activity of enhancers, impacting gene expression programs, and ultimately contributing to disease risk. A key obstacle to progress is the systematic functional characterization of non-coding variants at scale, especially since enhancer activity is highly specific to cell type and developmental stage. Here, we review the foundational studies of enhancers in developmental disease and current genomic approaches to functionally characterize developmental enhancers and their variants at scale. In the coming decade, we anticipate systematic enhancer perturbation studies to link non-coding variants to molecular mechanisms, changes in cell state, and disease phenotypes.
Collapse
Affiliation(s)
- Daniel A Armendariz
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
| | - Anjana Sundarrajan
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
| | - Gary C Hon
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
- Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, United States
- Lyda Hill Department of Bioinformatics, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, United States
| |
Collapse
|
160
|
Murphy D, Salataj E, Di Giammartino DC, Rodriguez-Hernaez J, Kloetgen A, Garg V, Char E, Uyehara CM, Ee LS, Lee U, Stadtfeld M, Hadjantonakis AK, Tsirigos A, Polyzos A, Apostolou E. Systematic mapping and modeling of 3D enhancer-promoter interactions in early mouse embryonic lineages reveal regulatory principles that determine the levels and cell-type specificity of gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.19.549714. [PMID: 37577543 PMCID: PMC10422694 DOI: 10.1101/2023.07.19.549714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Mammalian embryogenesis commences with two pivotal and binary cell fate decisions that give rise to three essential lineages, the trophectoderm (TE), the epiblast (EPI) and the primitive endoderm (PrE). Although key signaling pathways and transcription factors that control these early embryonic decisions have been identified, the non-coding regulatory elements via which transcriptional regulators enact these fates remain understudied. To address this gap, we have characterized, at a genome-wide scale, enhancer activity and 3D connectivity in embryo-derived stem cell lines that represent each of the early developmental fates. We observed extensive enhancer remodeling and fine-scale 3D chromatin rewiring among the three lineages, which strongly associate with transcriptional changes, although there are distinct groups of genes that are irresponsive to topological changes. In each lineage, a high degree of connectivity or "hubness" positively correlates with levels of gene expression and enriches for cell-type specific and essential genes. Genes within 3D hubs also show a significantly stronger probability of coregulation across lineages, compared to genes in linear proximity or within the same contact domains. By incorporating 3D chromatin features, we build a novel predictive model for transcriptional regulation (3D-HiChAT), which outperformed models that use only 1D promoter or proximal variables in predicting levels and cell-type specificity of gene expression. Using 3D-HiChAT, we performed genome-wide in silico perturbations to nominate candidate functional enhancers and hubs in each cell lineage, and with CRISPRi experiments we validated several novel enhancers that control expression of one or more genes in their respective lineages. Our study comprehensively identifies 3D regulatory hubs associated with the earliest mammalian lineages and describes their relationship to gene expression and cell identity, providing a framework to understand lineage-specific transcriptional behaviors.
Collapse
Affiliation(s)
- Dylan Murphy
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Eralda Salataj
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Dafne Campigli Di Giammartino
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
- 3D Chromatin Conformation and RNA genomics laboratory, Instituto Italiano di Tecnologia (IIT), Center for Human Technologies (CHT), Genova, Italy (current affiliation)
| | - Javier Rodriguez-Hernaez
- Department of Pathology, New York University Langone Health, New York, NY 10016, USA
- Applied Bioinformatics Laboratory, New York University Langone Health, New York, NY 10016, USA
| | - Andreas Kloetgen
- Department of Pathology, New York University Langone Health, New York, NY 10016, USA
- Applied Bioinformatics Laboratory, New York University Langone Health, New York, NY 10016, USA
| | - Vidur Garg
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Biochemistry Cell and Molecular Biology Program, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY 10065, USA
| | - Erin Char
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, 10065, New York, USA
| | - Christopher M. Uyehara
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Ly-sha Ee
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - UkJin Lee
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Matthias Stadtfeld
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Anna-Katerina Hadjantonakis
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Biochemistry Cell and Molecular Biology Program, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY 10065, USA
| | - Aristotelis Tsirigos
- Department of Pathology, New York University Langone Health, New York, NY 10016, USA
- Applied Bioinformatics Laboratory, New York University Langone Health, New York, NY 10016, USA
| | - Alexander Polyzos
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Effie Apostolou
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| |
Collapse
|
161
|
Milito A, Aschern M, McQuillan JL, Yang JS. Challenges and advances towards the rational design of microalgal synthetic promoters in Chlamydomonas reinhardtii. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:3833-3850. [PMID: 37025006 DOI: 10.1093/jxb/erad100] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 03/24/2023] [Indexed: 06/19/2023]
Abstract
Microalgae hold enormous potential to provide a safe and sustainable source of high-value compounds, acting as carbon-fixing biofactories that could help to mitigate rapidly progressing climate change. Bioengineering microalgal strains will be key to optimizing and modifying their metabolic outputs, and to render them competitive with established industrial biotechnology hosts, such as bacteria or yeast. To achieve this, precise and tuneable control over transgene expression will be essential, which would require the development and rational design of synthetic promoters as a key strategy. Among green microalgae, Chlamydomonas reinhardtii represents the reference species for bioengineering and synthetic biology; however, the repertoire of functional synthetic promoters for this species, and for microalgae generally, is limited in comparison to other commercial chassis, emphasizing the need to expand the current microalgal gene expression toolbox. Here, we discuss state-of-the-art promoter analyses, and highlight areas of research required to advance synthetic promoter development in C. reinhardtii. In particular, we exemplify high-throughput studies performed in other model systems that could be applicable to microalgae, and propose novel approaches to interrogating algal promoters. We lastly outline the major limitations hindering microalgal promoter development, while providing novel suggestions and perspectives for how to overcome them.
Collapse
Affiliation(s)
- Alfonsina Milito
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, Barcelona, Spain
| | - Moritz Aschern
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, Barcelona, Spain
| | - Josie L McQuillan
- Department of Chemical and Biological Engineering, University of Sheffield, Mappin Street, Sheffield, S1 3JD, UK
| | - Jae-Seong Yang
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, Barcelona, Spain
| |
Collapse
|
162
|
Dincer TU, Ernst J. Integrative epigenomic and functional characterization assay based annotation of regulatory activity across diverse human cell types. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.14.549056. [PMID: 37503240 PMCID: PMC10369970 DOI: 10.1101/2023.07.14.549056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
We introduce ChromActivity, a computational framework for predicting and annotating regulatory activity across the genome through integration of multiple epigenomic maps and various functional characterization datasets. ChromActivity generates genomewide predictions of regulatory activity associated with each functional characterization dataset across many cell types based on available epigenomic data. It then for each cell type produces (1) ChromScoreHMM genome annotations based on the combinatorial and spatial patterns within these predictions and (2) ChromScore tracks of overall predicted regulatory activity. ChromActivity provides a resource for analyzing and interpreting the human regulatory genome across diverse cell types.
Collapse
Affiliation(s)
- Tevfik Umut Dincer
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, 90095, USA
- Department of Biological Chemistry, University of California, Los Angeles, CA, 90095, USA
| | - Jason Ernst
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, 90095, USA
- Department of Biological Chemistry, University of California, Los Angeles, CA, 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at University of California, Los Angeles, CA, 90095, USA
- Computer Science Department, University of California, Los Angeles, CA, 90095, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, CA, 90095, USA
| |
Collapse
|
163
|
Jacobs J, Pagani M, Wenzl C, Stark A. Widespread regulatory specificities between transcriptional co-repressors and enhancers in Drosophila. Science 2023; 381:198-204. [PMID: 37440660 DOI: 10.1126/science.adf6149] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 06/13/2023] [Indexed: 07/15/2023]
Abstract
Gene expression is controlled by the precise activation and repression of transcription. Repression is mediated by specialized transcription factors (TFs) that recruit co-repressors (CoRs) to silence transcription, even in the presence of activating cues. However, whether CoRs can dominantly silence all enhancers or display distinct specificities is unclear. In this work, we report that most enhancers in Drosophila can be repressed by only a subset of CoRs, and enhancers classified by CoR sensitivity show distinct chromatin features, function, TF motifs, and binding. Distinct TF motifs render enhancers more resistant or sensitive to specific CoRs, as we demonstrate by motif mutagenesis and addition. These CoR-enhancer compatibilities constitute an additional layer of regulatory specificity that allows differential regulation at close genomic distances and is indicative of distinct mechanisms of transcriptional repression.
Collapse
Affiliation(s)
- Jelle Jacobs
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Michaela Pagani
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Christoph Wenzl
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus-Vienna-Biocenter 1, Vienna, Austria
- Medical University of Vienna, Vienna BioCenter (VBC), Vienna, Austria
| |
Collapse
|
164
|
Zhang Z, Feng F, Qiu Y, Liu J. A generalizable framework to comprehensively predict epigenome, chromatin organization, and transcriptome. Nucleic Acids Res 2023; 51:5931-5947. [PMID: 37224527 PMCID: PMC10325920 DOI: 10.1093/nar/gkad436] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 03/31/2023] [Accepted: 05/09/2023] [Indexed: 05/26/2023] Open
Abstract
Many deep learning approaches have been proposed to predict epigenetic profiles, chromatin organization, and transcription activity. While these approaches achieve satisfactory performance in predicting one modality from another, the learned representations are not generalizable across predictive tasks or across cell types. In this paper, we propose a deep learning approach named EPCOT which employs a pre-training and fine-tuning framework, and is able to accurately and comprehensively predict multiple modalities including epigenome, chromatin organization, transcriptome, and enhancer activity for new cell types, by only requiring cell-type specific chromatin accessibility profiles. Many of these predicted modalities, such as Micro-C and ChIA-PET, are quite expensive to get in practice, and the in silico prediction from EPCOT should be quite helpful. Furthermore, this pre-training and fine-tuning framework allows EPCOT to identify generic representations generalizable across different predictive tasks. Interpreting EPCOT models also provides biological insights including mapping between different genomic modalities, identifying TF sequence binding patterns, and analyzing cell-type specific TF impacts on enhancer activity.
Collapse
Affiliation(s)
- Zhenhao Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
| | - Fan Feng
- Department of Computational Medicine and Bioinformatics, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
| | - Yiyang Qiu
- Department of Computer Science and Engineering, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
| | - Jie Liu
- Department of Computational Medicine and Bioinformatics, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
- Department of Computer Science and Engineering, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
| |
Collapse
|
165
|
O'Connell RW, Rai K, Piepergerdes TC, Wang Y, Samra KD, Wilson JA, Lin S, Zhang TH, Ramos E, Sun A, Kille B, Curry KD, Rocks JW, Treangen TJ, Mehta P, Bashor CJ. Ultra-high throughput mapping of genetic design space. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.16.532704. [PMID: 36993481 PMCID: PMC10055055 DOI: 10.1101/2023.03.16.532704] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Massively parallel genetic screens have been used to map sequence-to-function relationships for a variety of genetic elements. However, because these approaches only interrogate short sequences, it remains challenging to perform high throughput (HT) assays on constructs containing combinations of sequence elements arranged across multi-kb length scales. Overcoming this barrier could accelerate synthetic biology; by screening diverse gene circuit designs, "composition-to-function" mappings could be created that reveal genetic part composability rules and enable rapid identification of behavior-optimized variants. Here, we introduce CLASSIC, a generalizable genetic screening platform that combines long- and short-read next-generation sequencing (NGS) modalities to quantitatively assess pooled libraries of DNA constructs of arbitrary length. We show that CLASSIC can measure expression profiles of >10 5 drug-inducible gene circuit designs (ranging from 6-9 kb) in a single experiment in human cells. Using statistical inference and machine learning (ML) approaches, we demonstrate that data obtained with CLASSIC enables predictive modeling of an entire circuit design landscape, offering critical insight into underlying design principles. Our work shows that by expanding the throughput and understanding gained with each design-build-test-learn (DBTL) cycle, CLASSIC dramatically augments the pace and scale of synthetic biology and establishes an experimental basis for data-driven design of complex genetic systems.
Collapse
|
166
|
Phan LT, Oh C, He T, Manavalan B. A comprehensive revisit of the machine-learning tools developed for the identification of enhancers in the human genome. Proteomics 2023; 23:e2200409. [PMID: 37021401 DOI: 10.1002/pmic.202200409] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/18/2023] [Accepted: 03/27/2023] [Indexed: 04/07/2023]
Abstract
Enhancers are non-coding DNA elements that play a crucial role in enhancing the transcription rate of a specific gene in the genome. Experiments for identifying enhancers can be restricted by their conditions and involve complicated, time-consuming, laborious, and costly steps. To overcome these challenges, computational platforms have been developed to complement experimental methods that enable high-throughput identification of enhancers. Over the last few years, the development of various enhancer computational tools has resulted in significant progress in predicting putative enhancers. Thus, researchers are now able to use a variety of strategies to enhance and advance enhancer study. In this review, an overview of machine learning (ML)-based prediction methods for enhancer identification and related databases has been provided. The existing enhancer-prediction methods have also been reviewed regarding their algorithms, feature selection processes, validation techniques, and software utility. In addition, the advantages and drawbacks of these ML approaches and guidelines for developing bioinformatic tools have been highlighted for a more efficient enhancer prediction. This review will serve as a useful resource for experimentalists in selecting the appropriate ML tool for their study, and for bioinformaticians in developing more accurate and advanced ML-based predictors.
Collapse
Affiliation(s)
- Le Thi Phan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Changmin Oh
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Tao He
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| |
Collapse
|
167
|
Jia BB, Jussila A, Kern C, Zhu Q, Ren B. A spatial genome aligner for resolving chromatin architectures from multiplexed DNA FISH. Nat Biotechnol 2023; 41:1004-1017. [PMID: 36593410 PMCID: PMC10344783 DOI: 10.1038/s41587-022-01568-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Accepted: 10/13/2022] [Indexed: 01/03/2023]
Abstract
Multiplexed fluorescence in situ hybridization (FISH) is a widely used approach for analyzing three-dimensional genome organization, but it is challenging to derive chromosomal conformations from noisy fluorescence signals, and tracing chromatin is not straightforward. Here we report a spatial genome aligner that parses true chromatin signal from noise by aligning signals to a DNA polymer model. Using genomic distances separating imaged loci, our aligner estimates spatial distances expected to separate loci on a polymer in three-dimensional space. Our aligner then evaluates the physical probability observed signals belonging to these loci are connected, thereby tracing chromatin structures. We demonstrate that this spatial genome aligner can efficiently model chromosome architectures from DNA FISH data across multiple scales and be used to predict chromosome ploidies de novo in interphase cells. Reprocessing of previous whole-genome chromosome tracing data with this method indicates the spatial aggregation of sister chromatids in S/G2 phase cells in asynchronous mouse embryonic stem cells and provides evidence for extranumerary chromosomes that remain tightly paired in postmitotic neurons of the adult mouse cortex.
Collapse
Affiliation(s)
- Bojing Blair Jia
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA
- Medical Scientist Training Program, University of California San Diego, La Jolla, CA, USA
| | - Adam Jussila
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA
| | - Colin Kern
- Department of Cellular and Molecular Medicine, Center for Epigenomics, University of California San Diego, La Jolla, CA, USA
| | - Quan Zhu
- Department of Cellular and Molecular Medicine, Center for Epigenomics, University of California San Diego, La Jolla, CA, USA
| | - Bing Ren
- Department of Cellular and Molecular Medicine, Center for Epigenomics, University of California San Diego, La Jolla, CA, USA.
- Ludwig Institute for Cancer Research, La Jolla, CA, USA.
- Institute of Genomic Medicine, Moores Cancer Center, School of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
168
|
FitzPatrick VD, Leemans C, van Arensbergen J, van Steensel B, Bussemaker H. Defining the fine structure of promoter activity on a genome-wide scale with CISSECTOR. Nucleic Acids Res 2023; 51:5499-5511. [PMID: 37013986 PMCID: PMC10287907 DOI: 10.1093/nar/gkad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 03/08/2023] [Accepted: 03/22/2023] [Indexed: 04/05/2023] Open
Abstract
Classic promoter mutagenesis strategies can be used to study how proximal promoter regions regulate the expression of particular genes of interest. This is a laborious process, in which the smallest sub-region of the promoter still capable of recapitulating expression in an ectopic setting is first identified, followed by targeted mutation of putative transcription factor binding sites. Massively parallel reporter assays such as survey of regulatory elements (SuRE) provide an alternative way to study millions of promoter fragments in parallel. Here we show how a generalized linear model (GLM) can be used to transform genome-scale SuRE data into a high-resolution genomic track that quantifies the contribution of local sequence to promoter activity. This coefficient track helps identify regulatory elements and can be used to predict promoter activity of any sub-region in the genome. It thus allows in silico dissection of any promoter in the human genome to be performed. We developed a web application, available at cissector.nki.nl, that lets researchers easily perform this analysis as a starting point for their research into any promoter of interest.
Collapse
Affiliation(s)
- Vincent D FitzPatrick
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Christ Leemans
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Joris van Arensbergen
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Bas van Steensel
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
- Department of Cell Biology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
169
|
Heer M, Giudice L, Mengoni C, Giugno R, Rico D. Esearch3D: propagating gene expression in chromatin networks to illuminate active enhancers. Nucleic Acids Res 2023; 51:e55. [PMID: 37021559 PMCID: PMC10250221 DOI: 10.1093/nar/gkad229] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 03/06/2023] [Accepted: 04/03/2023] [Indexed: 04/07/2023] Open
Abstract
Most cell type-specific genes are regulated by the interaction of enhancers with their promoters. The identification of enhancers is not trivial as enhancers are diverse in their characteristics and dynamic in their interaction partners. We present Esearch3D, a new method that exploits network theory approaches to identify active enhancers. Our work is based on the fact that enhancers act as a source of regulatory information to increase the rate of transcription of their target genes and that the flow of this information is mediated by the folding of chromatin in the three-dimensional (3D) nuclear space between the enhancer and the target gene promoter. Esearch3D reverse engineers this flow of information to calculate the likelihood of enhancer activity in intergenic regions by propagating the transcription levels of genes across 3D genome networks. Regions predicted to have high enhancer activity are shown to be enriched in annotations indicative of enhancer activity. These include: enhancer-associated histone marks, bidirectional CAGE-seq, STARR-seq, P300, RNA polymerase II and expression quantitative trait loci (eQTLs). Esearch3D leverages the relationship between chromatin architecture and transcription, allowing the prediction of active enhancers and an understanding of the complex underpinnings of regulatory networks. The method is available at: https://github.com/InfOmics/Esearch3D and https://doi.org/10.5281/zenodo.7737123.
Collapse
Affiliation(s)
- Maninder Heer
- Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
| | - Luca Giudice
- Department of Computer Science, University of Verona, Strada le Grazie 15, 37134, Verona, Italy
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| | - Claudia Mengoni
- Department of Computer Science, University of Verona, Strada le Grazie 15, 37134, Verona, Italy
| | - Rosalba Giugno
- Department of Computer Science, University of Verona, Strada le Grazie 15, 37134, Verona, Italy
| | - Daniel Rico
- Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
170
|
Hussain S, Sadouni N, van Essen D, Dao LTM, Ferré Q, Charbonnier G, Torres M, Gallardo F, Lecellier CH, Sexton T, Saccani S, Spicuglia S. Short tandem repeats are important contributors to silencer elements in T cells. Nucleic Acids Res 2023; 51:4845-4866. [PMID: 36929452 PMCID: PMC10250210 DOI: 10.1093/nar/gkad187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 02/26/2023] [Accepted: 03/15/2023] [Indexed: 03/18/2023] Open
Abstract
The action of cis-regulatory elements with either activation or repression functions underpins the precise regulation of gene expression during normal development and cell differentiation. Gene activation by the combined activities of promoters and distal enhancers has been extensively studied in normal and pathological contexts. In sharp contrast, gene repression by cis-acting silencers, defined as genetic elements that negatively regulate gene transcription in a position-independent fashion, is less well understood. Here, we repurpose the STARR-seq approach as a novel high-throughput reporter strategy to quantitatively assess silencer activity in mammals. We assessed silencer activity from DNase hypersensitive I sites in a mouse T cell line. Identified silencers were associated with either repressive or active chromatin marks and enriched for binding motifs of known transcriptional repressors. CRISPR-mediated genomic deletions validated the repressive function of distinct silencers involved in the repression of non-T cell genes and genes regulated during T cell differentiation. Finally, we unravel an association of silencer activity with short tandem repeats, highlighting the role of repetitive elements in silencer activity. Our results provide a general strategy for genome-wide identification and characterization of silencer elements.
Collapse
Affiliation(s)
- Saadat Hussain
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Nori Sadouni
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Dominic van Essen
- Institute for Research on Cancer and Ageing, IRCAN, 06107 Nice, France
| | - Lan T M Dao
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Quentin Ferré
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Guillaume Charbonnier
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Magali Torres
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Frederic Gallardo
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Charles-Henri Lecellier
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
- LIRMM, University of Montpellier, CNRS, Montpellier, France
| | - Tom Sexton
- Institut de Génétique et de Biologie Moléculaire et Cellulaire – IGBMC (CNRS UMR 7104, INSERM U1258, Université de Strasbourg), 67404 Illkirch, France
| | - Simona Saccani
- Institute for Research on Cancer and Ageing, IRCAN, 06107 Nice, France
| | - Salvatore Spicuglia
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| |
Collapse
|
171
|
Mach P, Giorgetti L. Integrative approaches to study enhancer-promoter communication. Curr Opin Genet Dev 2023; 80:102052. [PMID: 37257410 PMCID: PMC10293802 DOI: 10.1016/j.gde.2023.102052] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 04/21/2023] [Accepted: 04/22/2023] [Indexed: 06/02/2023]
Abstract
The spatiotemporal control of gene expression in complex multicellular organisms relies on noncoding regulatory sequences such as enhancers, which activate transcription of target genes often over large genomic distances. Despite the advances in the identification and characterization of enhancers, the principles and mechanisms by which enhancers select and control their target genes remain largely unknown. Here, we review recent interdisciplinary and quantitative approaches based on emerging techniques that aim to address open questions in the field, notably how regulatory information is encoded in the DNA sequence, how this information is transferred from enhancers to promoters, and how these processes are regulated in time.
Collapse
Affiliation(s)
- Pia Mach
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland; University of Basel, Basel, Switzerland. https://twitter.com/@MachPia
| | - Luca Giorgetti
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.
| |
Collapse
|
172
|
Fabo T, Khavari P. Functional characterization of human genomic variation linked to polygenic diseases. Trends Genet 2023; 39:462-490. [PMID: 36997428 PMCID: PMC11025698 DOI: 10.1016/j.tig.2023.02.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 02/22/2023] [Accepted: 02/23/2023] [Indexed: 03/30/2023]
Abstract
The burden of human disease lies predominantly in polygenic diseases. Since the early 2000s, genome-wide association studies (GWAS) have identified genetic variants and loci associated with complex traits. These have ranged from variants in coding sequences to mutations in regulatory regions, such as promoters and enhancers, as well as mutations affecting mediators of mRNA stability and other downstream regulators, such as 5' and 3'-untranslated regions (UTRs), long noncoding RNA (lncRNA), and miRNA. Recent research advances in genetics have utilized a combination of computational techniques, high-throughput in vitro and in vivo screening modalities, and precise genome editing to impute the function of diverse classes of genetic variants identified through GWAS. In this review, we highlight the vastness of genomic variants associated with polygenic disease risk and address recent advances in how genetic tools can be used to functionally characterize them.
Collapse
Affiliation(s)
- Tania Fabo
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA; Stanford Cancer Institute, Stanford University, Stanford, CA, USA; Graduate Program in Genetics, Stanford University, Stanford, CA, USA; Stanford University School of Medicine, Stanford University, Stanford, CA, USA
| | - Paul Khavari
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA; Stanford Cancer Institute, Stanford University, Stanford, CA, USA; Graduate Program in Genetics, Stanford University, Stanford, CA, USA; Stanford University School of Medicine, Stanford University, Stanford, CA, USA; Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA.
| |
Collapse
|
173
|
Marand AP, Eveland AL, Kaufmann K, Springer NM. cis-Regulatory Elements in Plant Development, Adaptation, and Evolution. ANNUAL REVIEW OF PLANT BIOLOGY 2023; 74:111-137. [PMID: 36608347 PMCID: PMC9881396 DOI: 10.1146/annurev-arplant-070122-030236] [Citation(s) in RCA: 85] [Impact Index Per Article: 42.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
cis-Regulatory elements encode the genomic blueprints that ensure the proper spatiotemporal patterning of gene expression necessary for appropriate development and responses to the environment. Accumulating evidence implicates changes to gene expression as a major source of phenotypic novelty in eukaryotes, including acute phenotypes such as disease and cancer in mammals. Moreover, genetic and epigenetic variation affecting cis-regulatory sequences over longer evolutionary timescales has become a recurring theme in studies of morphological divergence and local adaptation. Here, we discuss the functions of and methods used to identify various classes of cis-regulatory elements, as well as their role in plant development and response to the environment. We highlight opportunities to exploit cis-regulatory variants underlying plant development and environmental responses for crop improvement efforts. Although a comprehensive understanding of cis-regulatory mechanisms in plants has lagged behind that in animals, we showcase several breakthrough findings that have profoundly influenced plant biology and shaped the overall understanding of transcriptional regulation in eukaryotes.
Collapse
Affiliation(s)
| | | | - Kerstin Kaufmann
- Department for Plant Cell and Molecular Biology, Institute of Biology, Humboldt-Universität zu Berlin, Berlin, Germany;
| | - Nathan M Springer
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, Minnesota, USA;
| |
Collapse
|
174
|
Zhou P, VanDusen NJ, Zhang Y, Cao Y, Sethi I, Hu R, Zhang S, Wang G, Ye L, Mazumdar N, Chen J, Zhang X, Guo Y, Li B, Ma Q, Lee JY, Gu W, Yuan GC, Ren B, Chen K, Pu WT. Dynamic changes in P300 enhancers and enhancer-promoter contacts control mouse cardiomyocyte maturation. Dev Cell 2023; 58:898-914.e7. [PMID: 37071996 PMCID: PMC10231645 DOI: 10.1016/j.devcel.2023.03.020] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 02/16/2023] [Accepted: 03/05/2023] [Indexed: 04/20/2023]
Abstract
Cardiomyocyte differentiation continues throughout murine gestation and into the postnatal period, driven by temporally regulated expression changes in the transcriptome. The mechanisms that regulate these developmental changes remain incompletely defined. Here, we used cardiomyocyte-specific ChIP-seq of the activate enhancer marker P300 to identify 54,920 cardiomyocyte enhancers at seven stages of murine heart development. These data were matched to cardiomyocyte gene expression profiles at the same stages and to Hi-C and H3K27ac HiChIP chromatin conformation data at fetal, neonatal, and adult stages. Regions with dynamic P300 occupancy exhibited developmentally regulated enhancer activity, as measured by massively parallel reporter assays in cardiomyocytes in vivo, and identified key transcription factor-binding motifs. These dynamic enhancers interacted with temporal changes of the 3D genome architecture to specify developmentally regulated cardiomyocyte gene expressions. Our work provides a 3D genome-mediated enhancer activity landscape of murine cardiomyocyte development.
Collapse
Affiliation(s)
- Pingzhu Zhou
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Nathan J VanDusen
- Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Yanchun Zhang
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Yangpo Cao
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Isha Sethi
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Rong Hu
- Ludwig Institute for Cancer Research, Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Shuo Zhang
- Houston Methodist Hospital Research Institute, Houston, TX 77030, USA
| | - Guangyu Wang
- Cardiovascular Department, Houston Methodist, Weill Cornell Medical College, Houston, TX, USA
| | - Lincai Ye
- Department of Thoracic and Cardiovascular Surgery, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Neil Mazumdar
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Jian Chen
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Xiaoran Zhang
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Yuxuan Guo
- Peking University Health Science Center, Beijing, China
| | - Bin Li
- Ludwig Institute for Cancer Research, Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Qing Ma
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Julianna Y Lee
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA
| | - Weiliang Gu
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA; Department of Pharmacology, School of Pharmacy, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research, Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Kaifu Chen
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA.
| | - William T Pu
- Department of Cardiology, Boston Children's Hospital, Boston, MA, USA; Harvard Stem Cell Institute, Cambridge, MA, USA.
| |
Collapse
|
175
|
Zahm AM, Owens WS, Himes SR, Rondem KE, Fallon BS, Gormick AN, Bloom JS, Kosuri S, Chan H, English JG. Discovery and Validation of Context-Dependent Synthetic Mammalian Promoters. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.11.539703. [PMID: 37214829 PMCID: PMC10197685 DOI: 10.1101/2023.05.11.539703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Cellular transcription enables cells to adapt to various stimuli and maintain homeostasis. Transcription factors bind to transcription response elements (TREs) in gene promoters, initiating transcription. Synthetic promoters, derived from natural TREs, can be engineered to control exogenous gene expression using endogenous transcription machinery. This technology has found extensive use in biological research for applications including reporter gene assays, biomarker development, and programming synthetic circuits in living cells. However, a reliable and precise method for selecting minimally-sized synthetic promoters with desired background, amplitude, and stimulation response profiles has been elusive. In this study, we introduce a massively parallel reporter assay library containing 6184 synthetic promoters, each less than 250 bp in length. This comprehensive library allows for rapid identification of promoters with optimal transcriptional output parameters across multiple cell lines and stimuli. We showcase this library's utility to identify promoters activated in unique cell types, and in response to metabolites, mitogens, cellular toxins, and agonism of both aminergic and non-aminergic GPCRs. We further show these promoters can be used in luciferase reporter assays, eliciting 50-100 fold dynamic ranges in response to stimuli. Our platform is effective, easily implemented, and provides a solution for selecting short-length promoters with precise performance for a multitude of applications.
Collapse
Affiliation(s)
- Adam M. Zahm
- Department of Biochemistry, University of Utah School of Medicine, Salt Lake City, UT, USA
| | | | - Samuel R. Himes
- Department of Biochemistry, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Kathleen E. Rondem
- Department of Biochemistry, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Braden S. Fallon
- Department of Biochemistry, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Alexa N. Gormick
- Department of Biochemistry, University of Utah School of Medicine, Salt Lake City, UT, USA
| | | | | | | | - Justin G. English
- Department of Biochemistry, University of Utah School of Medicine, Salt Lake City, UT, USA
| |
Collapse
|
176
|
Stefan K, Barski A. Cis-regulatory atlas of primary human CD4+ T cells. BMC Genomics 2023; 24:253. [PMID: 37170195 PMCID: PMC10173520 DOI: 10.1186/s12864-023-09288-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 03/31/2023] [Indexed: 05/13/2023] Open
Abstract
Cis-regulatory elements (CRE) are critical for coordinating gene expression programs that dictate cell-specific differentiation and homeostasis. Recently developed self-transcribing active regulatory region sequencing (STARR-Seq) has allowed for genome-wide annotation of functional CREs. Despite this, STARR-Seq assays are only employed in cell lines, in part, due to difficulties in delivering reporter constructs. Herein, we implemented and validated a STARR-Seq-based screen in human CD4+ T cells using a non-integrating lentiviral transduction system. Lenti-STARR-Seq is the first example of a genome-wide assay of CRE function in human primary cells, identifying thousands of functional enhancers and negative regulatory elements (NREs) in human CD4+ T cells. We find an unexpected difference in nucleosome organization between enhancers and NRE: enhancers are located between nucleosomes, whereas NRE are occupied by nucleosomes in their endogenous locations. We also describe chromatin modification, eRNA production, and transcription factor binding at both enhancers and NREs. Our findings support the idea of silencer repurposing as enhancers in alternate cell types. Collectively, these data suggest that Lenti-STARR-Seq is a successful approach for CRE screening in primary human cell types, and provides an atlas of functional CREs in human CD4+ T cells.
Collapse
Affiliation(s)
- Kurtis Stefan
- Division of Allergy & Immunology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7028, Cincinnati, OH, 45229-3026, USA
- Medical Scientist Training Program (MSTP), University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA
| | - Artem Barski
- Division of Allergy & Immunology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7028, Cincinnati, OH, 45229-3026, USA.
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, 45229-3026, USA.
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA.
| |
Collapse
|
177
|
Guo Q, Wu S, Geschwind DH. Characterization of Gene Regulatory Elements in Human Fetal Cortical Development: Enhancing Our Understanding of Neurodevelopmental Disorders and Evolution. Dev Neurosci 2023; 46:69-83. [PMID: 37231806 DOI: 10.1159/000530929] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 04/24/2023] [Indexed: 05/27/2023] Open
Abstract
The neocortex is the region that most distinguishes human brain from other mammals and primates [Annu Rev Genet. 2021 Nov;55(1):555-81]. Studying the development of human cortex is important in understanding the evolutionary changes occurring in humans relative to other primates, as well as in elucidating mechanisms underlying neurodevelopmental disorders. Cortical development is a highly regulated process, spatially and temporally coordinated by expression of essential transcriptional factors in response to signaling pathways [Neuron. 2019 Sep;103(6):980-1004]. Enhancers are the most well-understood cis-acting, non-protein-coding regulatory elements that regulate gene expression [Nat Rev Genet. 2014 Apr;15(4):272-86]. Importantly, given the conservation of both DNA sequence and molecular function of the majority of proteins across mammals [Genome Res. 2003 Dec;13(12):2507-18], enhancers [Science. 2015 Mar;347(6226):1155-9], which are far more divergent at the sequence level, likely account for the phenotypes that distinguish the human brain by changing the regulation of gene expression. In this review, we will revisit the conceptual framework of gene regulation during human brain development, as well as the evolution of technologies to study transcriptional regulation, with recent advances in genome biology that open a window allowing us to systematically characterize cis-regulatory elements in developing human brain [Hum Mol Genet. 2022 Oct;31(R1):R84-96]. We provide an update on work to characterize the suite of all enhancers in the developing human brain and the implications for understanding neuropsychiatric disorders. Finally, we discuss emerging therapeutic ideas that utilize our emerging knowledge of enhancer function.
Collapse
Affiliation(s)
- Qiuyu Guo
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, USA
- Center for Autism Research and Treatment, Semel Institute, University of California Los Angeles, Los Angeles, California, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, USA
| | - Sarah Wu
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, USA
| | - Daniel H Geschwind
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, USA
- Center for Autism Research and Treatment, Semel Institute, University of California Los Angeles, Los Angeles, California, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, USA
- Institute of Precision Health, University of California Los Angeles, Los Angeles, California, USA
| |
Collapse
|
178
|
Majdandzic A, Rajesh C, Koo PK. Correcting gradient-based interpretations of deep neural networks for genomics. Genome Biol 2023; 24:109. [PMID: 37161475 PMCID: PMC10169356 DOI: 10.1186/s13059-023-02956-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 04/28/2023] [Indexed: 05/11/2023] Open
Abstract
Post hoc attribution methods can provide insights into the learned patterns from deep neural networks (DNNs) trained on high-throughput functional genomics data. However, in practice, their resultant attribution maps can be challenging to interpret due to spurious importance scores for seemingly arbitrary nucleotides. Here, we identify a previously overlooked attribution noise source that arises from how DNNs handle one-hot encoded DNA. We demonstrate this noise is pervasive across various genomic DNNs and introduce a statistical correction that effectively reduces it, leading to more reliable attribution maps. Our approach represents a promising step towards gaining meaningful insights from DNNs in regulatory genomics.
Collapse
Affiliation(s)
- Antonio Majdandzic
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, USA
| | - Chandana Rajesh
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, USA
| | - Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, USA.
| |
Collapse
|
179
|
Schofield JA, Hahn S. Broad compatibility between yeast UAS elements and core promoters and identification of promoter elements that determine cofactor specificity. Cell Rep 2023; 42:112387. [PMID: 37058407 PMCID: PMC10567116 DOI: 10.1016/j.celrep.2023.112387] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 01/30/2023] [Accepted: 03/28/2023] [Indexed: 04/15/2023] Open
Abstract
Three classes of yeast protein-coding genes are distinguished by their dependence on the transcription cofactors TFIID, SAGA, and Mediator (MED) Tail, but whether this dependence is determined by the core promoter, upstream activating sequences (UASs), or other gene features is unclear. Also unclear is whether UASs can broadly activate transcription from the different promoter classes. Here, we measure transcription and cofactor specificity for thousands of UAS-core promoter combinations and find that most UASs broadly activate promoters regardless of regulatory class, while few display strong promoter specificity. However, matching UASs and promoters from the same gene class is generally important for optimal expression. We find that sensitivity to rapid depletion of MED Tail or SAGA is dependent on the identity of both UAS and core promoter, while dependence on TFIID localizes to only the promoter. Finally, our results suggest the role of TATA and TATA-like promoter sequences in MED Tail function.
Collapse
Affiliation(s)
- Jeremy A Schofield
- Basic Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Avenue N, Seattle, WA 98105, USA
| | - Steven Hahn
- Basic Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Avenue N, Seattle, WA 98105, USA.
| |
Collapse
|
180
|
Chan YC, Kienle E, Oti M, Di Liddo A, Mendez-Lago M, Aschauer DF, Peter M, Pagani M, Arnold C, Vonderheit A, Schön C, Kreuz S, Stark A, Rumpel S. An unbiased AAV-STARR-seq screen revealing the enhancer activity map of genomic regions in the mouse brain in vivo. Sci Rep 2023; 13:6745. [PMID: 37185990 PMCID: PMC10130037 DOI: 10.1038/s41598-023-33448-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 04/12/2023] [Indexed: 05/17/2023] Open
Abstract
Enhancers are important cis-regulatory elements controlling cell-type specific expression patterns of genes. Furthermore, combinations of enhancers and minimal promoters are utilized to construct small, artificial promoters for gene delivery vectors. Large-scale functional screening methodology to construct genomic maps of enhancer activities has been successfully established in cultured cell lines, however, not yet applied to terminally differentiated cells and tissues in a living animal. Here, we transposed the Self-Transcribing Active Regulatory Region Sequencing (STARR-seq) technique to the mouse brain using adeno-associated-viruses (AAV) for the delivery of a highly complex screening library tiling entire genomic regions and covering in total 3 Mb of the mouse genome. We identified 483 sequences with enhancer activity, including sequences that were not predicted by DNA accessibility or histone marks. Characterizing the expression patterns of fluorescent reporters controlled by nine candidate sequences, we observed differential expression patterns also in sparse cell types. Together, our study provides an entry point for the unbiased study of enhancer activities in organisms during health and disease.
Collapse
Affiliation(s)
- Ya-Chien Chan
- Institute of Physiology, Focus Program Translational Neurosciences, University Medical Center, Johannes Gutenberg University Mainz, Mainz, Germany
| | - Eike Kienle
- Institute of Physiology, Focus Program Translational Neurosciences, University Medical Center, Johannes Gutenberg University Mainz, Mainz, Germany
| | - Martin Oti
- Institute of Molecular Biology GmbH (IMB), Mainz, Germany
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an Der Riß, Germany
| | | | | | - Dominik F Aschauer
- Institute of Physiology, Focus Program Translational Neurosciences, University Medical Center, Johannes Gutenberg University Mainz, Mainz, Germany
| | - Manuel Peter
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna, Austria
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Michaela Pagani
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna, Austria
| | - Cosmas Arnold
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna, Austria
- CeMM Research Center for Molecular Medicine, Austrian Academy of Sciences, Vienna, Austria
| | | | - Christian Schön
- Research Beyond Borders, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an Der Riß, Germany
| | - Sebastian Kreuz
- Research Beyond Borders, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an Der Riß, Germany
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna, Austria
- Medical University of Vienna, Vienna BioCenter (VBC), 1030, Vienna, Austria
| | - Simon Rumpel
- Institute of Physiology, Focus Program Translational Neurosciences, University Medical Center, Johannes Gutenberg University Mainz, Mainz, Germany.
| |
Collapse
|
181
|
Zhou A, Kirkpatrick LD, Ornelas IJ, Washington LJ, Hummel NFC, Gee CW, Tang SN, Barnum CR, Scheller HV, Shih PM. A Suite of Constitutive Promoters for Tuning Gene Expression in Plants. ACS Synth Biol 2023; 12:1533-1545. [PMID: 37083366 DOI: 10.1021/acssynbio.3c00075] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/22/2023]
Abstract
The need for convenient tools to express transgenes over a large dynamic range is pervasive throughout plant synthetic biology; however, current efforts are largely limited by the heavy reliance on a small set of strong promoters, precluding more nuanced and refined engineering endeavors in planta. To address this technical gap, we characterize a suite of constitutive promoters that span a wide range of transcriptional levels and develop a GoldenGate-based plasmid toolkit named PCONS, optimized for versatile cloning and rapid testing of transgene expression at varying strengths. We demonstrate how easy access to a stepwise gradient of expression levels can be used for optimizing synthetic transcriptional systems and the production of small molecules in planta. We also systematically investigate the potential of using PCONS as an internal standard in plant biology experimental design, establishing the best practices for signal normalization in experiments. Although our library has primarily been developed for optimizing expression in N. benthamiana, we demonstrate the translatability of our promoters across distantly related species using a multiplexed reporter assay with barcoded transcripts. Our findings showcase the advantages of the PCONS library as an invaluable toolkit for plant synthetic biology.
Collapse
Affiliation(s)
- Andy Zhou
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, California 94720, United States
- Feedstocks Division, Joint BioEnergy Institute, Emeryville, California 94608, United States
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94705, United States
| | - Liam D Kirkpatrick
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, California 94720, United States
- Feedstocks Division, Joint BioEnergy Institute, Emeryville, California 94608, United States
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94705, United States
| | - Izaiah J Ornelas
- Feedstocks Division, Joint BioEnergy Institute, Emeryville, California 94608, United States
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94705, United States
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, United States
| | - Lorenzo J Washington
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, California 94720, United States
- Feedstocks Division, Joint BioEnergy Institute, Emeryville, California 94608, United States
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94705, United States
| | - Niklas F C Hummel
- Feedstocks Division, Joint BioEnergy Institute, Emeryville, California 94608, United States
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94705, United States
| | - Christopher W Gee
- Feedstocks Division, Joint BioEnergy Institute, Emeryville, California 94608, United States
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94705, United States
| | - Sophia N Tang
- Feedstocks Division, Joint BioEnergy Institute, Emeryville, California 94608, United States
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94705, United States
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, United States
| | - Collin R Barnum
- Biochemistry, Molecular, Cellular and Developmental Biology Graduate Group, University of California, Davis, California 95616, United States
| | - Henrik V Scheller
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, California 94720, United States
- Feedstocks Division, Joint BioEnergy Institute, Emeryville, California 94608, United States
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94705, United States
| | - Patrick M Shih
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, California 94720, United States
- Feedstocks Division, Joint BioEnergy Institute, Emeryville, California 94608, United States
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California 94705, United States
- Innovative Genomics Institute, University of California, Berkeley, California 94720, United States
| |
Collapse
|
182
|
Kemmler CL, Moran HR, Murray BF, Scoresby A, Klem JR, Eckert RL, Lepovsky E, Bertho S, Nieuwenhuize S, Burger S, D'Agati G, Betz C, Puller AC, Felker A, Ditrychova K, Bötschi S, Affolter M, Rohner N, Lovely CB, Kwan KM, Burger A, Mosimann C. Next-generation plasmids for transgenesis in zebrafish and beyond. Development 2023; 150:dev201531. [PMID: 36975217 PMCID: PMC10263156 DOI: 10.1242/dev.201531] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 03/10/2023] [Indexed: 03/29/2023]
Abstract
Transgenesis is an essential technique for any genetic model. Tol2-based transgenesis paired with Gateway-compatible vector collections has transformed zebrafish transgenesis with an accessible modular system. Here, we establish several next-generation transgenesis tools for zebrafish and other species to expand and enhance transgenic applications. To facilitate gene regulatory element testing, we generated Gateway middle entry vectors harboring the small mouse beta-globin minimal promoter coupled to several fluorophores, CreERT2 and Gal4. To extend the color spectrum for transgenic applications, we established middle entry vectors encoding the bright, blue-fluorescent protein mCerulean and mApple as an alternative red fluorophore. We present a series of p2A peptide-based 3' vectors with different fluorophores and subcellular localizations to co-label cells expressing proteins of interest. Finally, we established Tol2 destination vectors carrying the zebrafish exorh promoter driving different fluorophores as a pineal gland-specific transgenesis marker that is active before hatching and through adulthood. exorh-based reporters and transgenesis markers also drive specific pineal gland expression in the eye-less cavefish (Astyanax). Together, our vectors provide versatile reagents for transgenesis applications in zebrafish, cavefish and other models.
Collapse
Affiliation(s)
- Cassie L. Kemmler
- University of Colorado, School of Medicine, Anschutz Medical Campus, Department of Pediatrics, Section of Developmental Biology, 12801 E 17th Avenue, Aurora, CO 80045, USA
| | - Hannah R. Moran
- University of Colorado, School of Medicine, Anschutz Medical Campus, Department of Pediatrics, Section of Developmental Biology, 12801 E 17th Avenue, Aurora, CO 80045, USA
| | - Brooke F. Murray
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Aaron Scoresby
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - John R. Klem
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY 40202, USA
| | - Rachel L. Eckert
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY 40202, USA
| | - Elizabeth Lepovsky
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY 40202, USA
| | - Sylvain Bertho
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Susan Nieuwenhuize
- University of Colorado, School of Medicine, Anschutz Medical Campus, Department of Pediatrics, Section of Developmental Biology, 12801 E 17th Avenue, Aurora, CO 80045, USA
- Department of Molecular Life Sciences, University of Zurich, 8057 Zürich, Switzerland
| | - Sibylle Burger
- Department of Molecular Life Sciences, University of Zurich, 8057 Zürich, Switzerland
| | - Gianluca D'Agati
- Department of Molecular Life Sciences, University of Zurich, 8057 Zürich, Switzerland
| | - Charles Betz
- Growth & Development, Biozentrum, Spitalstrasse 41, University of Basel, 4056 Basel, Switzerland
| | - Ann-Christin Puller
- Department of Molecular Life Sciences, University of Zurich, 8057 Zürich, Switzerland
| | - Anastasia Felker
- Department of Molecular Life Sciences, University of Zurich, 8057 Zürich, Switzerland
| | - Karolina Ditrychova
- Department of Molecular Life Sciences, University of Zurich, 8057 Zürich, Switzerland
| | - Seraina Bötschi
- Department of Molecular Life Sciences, University of Zurich, 8057 Zürich, Switzerland
| | - Markus Affolter
- Growth & Development, Biozentrum, Spitalstrasse 41, University of Basel, 4056 Basel, Switzerland
| | - Nicolas Rohner
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - C. Ben Lovely
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY 40202, USA
| | - Kristen M. Kwan
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Alexa Burger
- University of Colorado, School of Medicine, Anschutz Medical Campus, Department of Pediatrics, Section of Developmental Biology, 12801 E 17th Avenue, Aurora, CO 80045, USA
| | - Christian Mosimann
- University of Colorado, School of Medicine, Anschutz Medical Campus, Department of Pediatrics, Section of Developmental Biology, 12801 E 17th Avenue, Aurora, CO 80045, USA
| |
Collapse
|
183
|
Di Giorgio E, Benetti R, Kerschbamer E, Xodo L, Brancolini C. Super-enhancer landscape rewiring in cancer: The epigenetic control at distal sites. INTERNATIONAL REVIEW OF CELL AND MOLECULAR BIOLOGY 2023; 380:97-148. [PMID: 37657861 DOI: 10.1016/bs.ircmb.2023.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]
Abstract
Super-enhancers evolve as elements at the top of the hierarchical control of gene expression. They are important end-gatherers of signaling pathways that control stemness, differentiation or adaptive responses. Many epigenetic regulations focus on these regions, and not surprisingly, during the process of tumorigenesis, various alterations can account for their dysfunction. Super-enhancers are emerging as key drivers of the aberrant gene expression landscape that sustain the aggressiveness of cancer cells. In this review, we will describe and discuss about the structure of super-enhancers, their epigenetic regulation, and the major changes affecting their functionality in cancer.
Collapse
Affiliation(s)
- Eros Di Giorgio
- Laboratory of Biochemistry, Department of Medicine, Università degli Studi di Udine, Udine, Italy
| | - Roberta Benetti
- Laboratory of Epigenomics, Department of Medicine, Università degli Studi di Udine, Udine, Italy
| | - Emanuela Kerschbamer
- Laboratory of Epigenomics, Department of Medicine, Università degli Studi di Udine, Udine, Italy
| | - Luigi Xodo
- Laboratory of Biochemistry, Department of Medicine, Università degli Studi di Udine, Udine, Italy
| | - Claudio Brancolini
- Laboratory of Epigenomics, Department of Medicine, Università degli Studi di Udine, Udine, Italy.
| |
Collapse
|
184
|
Wu Q, Wu J, Karim K, Chen X, Wang T, Iwama S, Carobbio S, Keen P, Vidal-Puig A, Kotter MR, Bassett A. Massively parallel characterization of CRISPR activator efficacy in human induced pluripotent stem cells and neurons. Mol Cell 2023; 83:1125-1139.e8. [PMID: 36917981 PMCID: PMC10114495 DOI: 10.1016/j.molcel.2023.02.011] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 12/21/2022] [Accepted: 02/10/2023] [Indexed: 03/14/2023]
Abstract
CRISPR activation (CRISPRa) is an important tool to perturb transcription, but its effectiveness varies between target genes. We employ human pluripotent stem cells with thousands of randomly integrated barcoded reporters to assess epigenetic features that influence CRISPRa efficacy. Basal expression levels are influenced by genomic context and dramatically change during differentiation to neurons. Gene activation by dCas9-VPR is successful in most genomic contexts, including developmentally repressed regions, and activation level is anti-correlated with basal gene expression, whereas dCas9-p300 is ineffective in stem cells. Certain chromatin states, such as bivalent chromatin, are particularly sensitive to dCas9-VPR, whereas constitutive heterochromatin is less responsive. We validate these rules at endogenous genes and show that activation of certain genes elicits a change in the stem cell transcriptome, sometimes showing features of differentiated cells. Our data provide rules to predict CRISPRa outcome and highlight its utility to screen for factors driving stem cell differentiation.
Collapse
Affiliation(s)
- Qianxin Wu
- Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
| | - Junjing Wu
- Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK; Institute of Animal Science and Veterinary Medicine, Hubei Academy of Agricultural Sciences, Wuhan 430064, China
| | - Kaiser Karim
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0QQ, UK
| | - Xi Chen
- Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK; Southern University of Science and Technology, 1088 Xueyuan Ave, Nanshan, Shenzhen, Guangdong 518055, China
| | - Tengyao Wang
- Department of Statistics, London School of Economics and Political Science, London WC2B 4RR, UK
| | - Sho Iwama
- Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Stefania Carobbio
- Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK; Metabolic Research Laboratories, Addenbrooke's Treatment Center, Institute of Metabolic Science, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK; Centro de Investigacion Principe Felipe, 46012 Valencia, Spain
| | - Peter Keen
- Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Antonio Vidal-Puig
- Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK; Metabolic Research Laboratories, Addenbrooke's Treatment Center, Institute of Metabolic Science, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK; Centro de Investigacion Principe Felipe, 46012 Valencia, Spain
| | - Mark R Kotter
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0QQ, UK
| | - Andrew Bassett
- Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
| |
Collapse
|
185
|
Das M, Hossain A, Banerjee D, Praul CA, Girirajan S. Challenges and considerations for reproducibility of STARR-seq assays. Genome Res 2023; 33:479-495. [PMID: 37130797 PMCID: PMC10234304 DOI: 10.1101/gr.277204.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 03/15/2023] [Indexed: 05/04/2023]
Abstract
High-throughput methods such as RNA-seq, ChIP-seq, and ATAC-seq have well-established guidelines, commercial kits, and analysis pipelines that enable consistency and wider adoption for understanding genome function and regulation. STARR-seq, a popular assay for directly quantifying the activities of thousands of enhancer sequences simultaneously, has seen limited standardization across studies. The assay is long, with more than 250 steps, and frequent customization of the protocol and variations in bioinformatics methods raise concerns for reproducibility of STARR-seq studies. Here, we assess each step of the protocol and analysis pipelines from published sources and in-house assays, and identify critical steps and quality control (QC) checkpoints necessary for reproducibility of the assay. We also provide guidelines for experimental design, protocol scaling, customization, and analysis pipelines for better adoption of the assay. These resources will allow better optimization of STARR-seq for specific research needs, enable comparisons and integration across studies, and improve the reproducibility of results.
Collapse
Affiliation(s)
- Maitreya Das
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- Molecular and Cellular Integrative Biosciences Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Ayaan Hossain
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Deepro Banerjee
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Craig Alan Praul
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Santhosh Girirajan
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- Molecular and Cellular Integrative Biosciences Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Anthropology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
186
|
Ren N, Dai S, Ma S, Yang F. Strategies for activity analysis of single nucleotide polymorphisms associated with human diseases. Clin Genet 2023; 103:392-400. [PMID: 36527336 DOI: 10.1111/cge.14282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/10/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
Genome-wide association studies (GWAS) have identified a large number of single nucleotide polymorphism (SNP) sites associated with human diseases. In the annotation of human diseases, especially cancers, SNPs, as an important component of genetic factors, have gained increasing attention. Given that most of the SNPs are located in non-coding regions, the functional verification of these SNPs is a great challenge. The key to functional annotation for risk SNPs is to screen SNPs with regulatory activity from thousands of disease associated-SNPs. In this review, we systematically recapitulate the characteristics and functional roles of SNP sites, discuss three parallel reporter screening strategies in detail based on barcode tag classification, and recommend the common in silico strategies to help supplement the annotation of SNP sites with epigenetic activity analysis, prediction of target genes and trans-acting factors. We hope that this review will contribute to this exuberant research field by providing robust activity analysis strategies that can facilitate the translation of GWAS results into personalized diagnosis and prevention measures for human diseases.
Collapse
Affiliation(s)
- Naixia Ren
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shangkun Dai
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shumin Ma
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
| | - Fengtang Yang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| |
Collapse
|
187
|
Singh P, Stevenson SR, Dickinson PJ, Reyna-Llorens I, Tripathi A, Reeves G, Schreier TB, Hibberd JM. C 4 gene induction during de-etiolation evolved through changes in cis to allow integration with ancestral C 3 gene regulatory networks. SCIENCE ADVANCES 2023; 9:eade9756. [PMID: 36989352 PMCID: PMC10058240 DOI: 10.1126/sciadv.ade9756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 03/01/2023] [Indexed: 06/19/2023]
Abstract
C4 photosynthesis has evolved by repurposing enzymes found in C3 plants. Compared with the ancestral C3 state, accumulation of C4 cycle proteins is enhanced. We used de-etiolation of C4 Gynandropsis gynandra and C3 Arabidopsis thaliana to understand this process. C4 gene expression and chloroplast biogenesis in G. gynandra were tightly coordinated. Although C3 and C4 photosynthesis genes showed similar induction patterns, in G. gynandra, C4 genes were more strongly induced than orthologs from A. thaliana. In vivo binding of TGA and homeodomain as well as light-responsive elements such as G- and I-box motifs were associated with the rapid increase in transcripts of C4 genes. Deletion analysis confirmed that regions containing G- and I-boxes were necessary for high expression. The data support a model in which accumulation of transcripts derived from C4 photosynthesis genes in C4 leaves is enhanced because modifications in cis allowed integration into ancestral transcriptional networks.
Collapse
|
188
|
Zheng Y, VanDusen NJ. Massively Parallel Reporter Assays for High-Throughput In Vivo Analysis of Cis-Regulatory Elements. J Cardiovasc Dev Dis 2023; 10:jcdd10040144. [PMID: 37103023 PMCID: PMC10146671 DOI: 10.3390/jcdd10040144] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/24/2023] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
The rapid improvement of descriptive genomic technologies has fueled a dramatic increase in hypothesized connections between cardiovascular gene expression and phenotypes. However, in vivo testing of these hypotheses has predominantly been relegated to slow, expensive, and linear generation of genetically modified mice. In the study of genomic cis-regulatory elements, generation of mice featuring transgenic reporters or cis-regulatory element knockout remains the standard approach. While the data obtained is of high quality, the approach is insufficient to keep pace with candidate identification and therefore results in biases introduced during the selection of candidates for validation. However, recent advances across a range of disciplines are converging to enable functional genomic assays that can be conducted in a high-throughput manner. Here, we review one such method, massively parallel reporter assays (MPRAs), in which the activities of thousands of candidate genomic regulatory elements are simultaneously assessed via the next-generation sequencing of a barcoded reporter transcript. We discuss best practices for MPRA design and use, with a focus on practical considerations, and review how this emerging technology has been successfully deployed in vivo. Finally, we discuss how MPRAs are likely to evolve and be used in future cardiovascular research.
Collapse
|
189
|
Cao Y, Zhang X, Akerberg BN, Yuan H, Sakamoto T, Xiao F, VanDusen NJ, Zhou P, Sweat ME, Wang Y, Prondzynski M, Chen J, Zhang Y, Wang P, Kelly DP, Pu WT. In Vivo Dissection of Chamber-Selective Enhancers Reveals Estrogen-Related Receptor as a Regulator of Ventricular Cardiomyocyte Identity. Circulation 2023; 147:881-896. [PMID: 36705030 PMCID: PMC10010668 DOI: 10.1161/circulationaha.122.061955] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
BACKGROUND Cardiac chamber-selective transcriptional programs underpin the structural and functional differences between atrial and ventricular cardiomyocytes (aCMs and vCMs). The mechanisms responsible for these chamber-selective transcriptional programs remain largely undefined. METHODS We nominated candidate chamber-selective enhancers (CSEs) by determining the genome-wide occupancy of 7 key cardiac transcription factors (GATA4, MEF2A, MEF2C, NKX2-5, SRF, TBX5, TEAD1) and transcriptional coactivator P300 in atria and ventricles. Candidate enhancers were tested using an adeno-associated virus-mediated massively parallel reporter assay. Chromatin features of CSEs were evaluated by performing assay of transposase accessible chromatin sequencing and acetylation of histone H3 at lysine 27-HiChIP on aCMs and vCMs. CSE sequence requirements were determined by systematic tiling mutagenesis of 29 CSEs at 5 bp resolution. Estrogen-related receptor (ERR) function in cardiomyocytes was evaluated by Cre-loxP-mediated inactivation of ERRα and ERRγ in cardiomyocytes. RESULTS We identified 134 066 and 97 506 regions reproducibly occupied by at least 1 transcription factor or P300, in atria or ventricles, respectively. Enhancer activities of 2639 regions bound by transcription factors or P300 were tested in aCMs and vCMs by adeno-associated virus-mediated massively parallel reporter assay. This identified 1092 active enhancers in aCMs or vCMs. Several overlapped loci associated with cardiovascular disease through genome-wide association studies, and 229 exhibited chamber-selective activity in aCMs or vCMs. Many CSEs exhibited differential chromatin accessibility between aCMs and vCMs, and CSEs were enriched for aCM- or vCM-selective acetylation of histone H3 at lysine 27-anchored loops. Tiling mutagenesis of 29 CSEs identified the binding motif of ERRα/γ as important for ventricular enhancer activity. The requirement of ERRα/γ to activate ventricular CSEs and promote vCM identity was confirmed by loss of the vCM gene profile in ERRα/γ knockout vCMs. CONCLUSIONS We identified 229 CSEs that could be useful research tools or direct therapeutic gene expression. We showed that chamber-selective multi-transcription factor, P300 occupancy, open chromatin, and chromatin looping are predictive features of CSEs. We found that ERRα/γ are essential for maintenance of ventricular identity. Finally, our gene expression, epigenetic, 3-dimensional genome, and enhancer activity atlas provide key resources for future studies of chamber-selective gene regulation.
Collapse
Affiliation(s)
- Yangpo Cao
- Department of Cardiology, Boston Children's Hospital, Boston, MA (Y.C., X.Z., B.N.A., F.X., P.Z., M.E.S., Y.W., M.P., J.C., Y.Z., P.W., W.T.P.)
| | - Xiaoran Zhang
- Department of Cardiology, Boston Children's Hospital, Boston, MA (Y.C., X.Z., B.N.A., F.X., P.Z., M.E.S., Y.W., M.P., J.C., Y.Z., P.W., W.T.P.)
| | - Brynn N Akerberg
- Department of Cardiology, Boston Children's Hospital, Boston, MA (Y.C., X.Z., B.N.A., F.X., P.Z., M.E.S., Y.W., M.P., J.C., Y.Z., P.W., W.T.P.)
| | - Haiyun Yuan
- Department of Cardiovascular Surgery, Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangzhou, China (H.Y.)
| | - Tomoya Sakamoto
- Cardiovascular Institute, Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia (T.S., D.P.K.)
| | - Feng Xiao
- Department of Cardiology, Boston Children's Hospital, Boston, MA (Y.C., X.Z., B.N.A., F.X., P.Z., M.E.S., Y.W., M.P., J.C., Y.Z., P.W., W.T.P.)
| | - Nathan J VanDusen
- Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis (N.J.V.)
| | - Pingzhu Zhou
- Department of Cardiology, Boston Children's Hospital, Boston, MA (Y.C., X.Z., B.N.A., F.X., P.Z., M.E.S., Y.W., M.P., J.C., Y.Z., P.W., W.T.P.)
| | - Mason E Sweat
- Department of Cardiology, Boston Children's Hospital, Boston, MA (Y.C., X.Z., B.N.A., F.X., P.Z., M.E.S., Y.W., M.P., J.C., Y.Z., P.W., W.T.P.)
| | - Yi Wang
- Department of Cardiology, Boston Children's Hospital, Boston, MA (Y.C., X.Z., B.N.A., F.X., P.Z., M.E.S., Y.W., M.P., J.C., Y.Z., P.W., W.T.P.)
| | - Maksymilian Prondzynski
- Department of Cardiology, Boston Children's Hospital, Boston, MA (Y.C., X.Z., B.N.A., F.X., P.Z., M.E.S., Y.W., M.P., J.C., Y.Z., P.W., W.T.P.)
| | - Jian Chen
- Department of Cardiology, Boston Children's Hospital, Boston, MA (Y.C., X.Z., B.N.A., F.X., P.Z., M.E.S., Y.W., M.P., J.C., Y.Z., P.W., W.T.P.)
| | - Yan Zhang
- Department of Cardiology, Boston Children's Hospital, Boston, MA (Y.C., X.Z., B.N.A., F.X., P.Z., M.E.S., Y.W., M.P., J.C., Y.Z., P.W., W.T.P.)
| | - Peizhe Wang
- Department of Cardiology, Boston Children's Hospital, Boston, MA (Y.C., X.Z., B.N.A., F.X., P.Z., M.E.S., Y.W., M.P., J.C., Y.Z., P.W., W.T.P.)
| | - Daniel P Kelly
- Cardiovascular Institute, Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia (T.S., D.P.K.)
| | - William T Pu
- Department of Cardiology, Boston Children's Hospital, Boston, MA (Y.C., X.Z., B.N.A., F.X., P.Z., M.E.S., Y.W., M.P., J.C., Y.Z., P.W., W.T.P.).,Harvard Stem Cell Institute, Cambridge, MA (W.T.P.)
| |
Collapse
|
190
|
Stankey CT, Lee JC. Translating non-coding genetic associations into a better understanding of immune-mediated disease. Dis Model Mech 2023; 16:dmm049790. [PMID: 36897113 PMCID: PMC10040244 DOI: 10.1242/dmm.049790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023] Open
Abstract
Genome-wide association studies have identified hundreds of genetic loci that are associated with immune-mediated diseases. Most disease-associated variants are non-coding, and a large proportion of these variants lie within enhancers. As a result, there is a pressing need to understand how common genetic variation might affect enhancer function and thereby contribute to immune-mediated (and other) diseases. In this Review, we first describe statistical and experimental methods to identify causal genetic variants that modulate gene expression, including statistical fine-mapping and massively parallel reporter assays. We then discuss approaches to characterise the mechanisms by which these variants modulate immune function, such as clustered regularly interspaced short palindromic repeats (CRISPR)-based screens. We highlight examples of studies that, by elucidating the effects of disease variants within enhancers, have provided important insights into immune function and uncovered key pathways of disease.
Collapse
Affiliation(s)
- Christina T. Stankey
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London NW1 1AT, UK
- Department of Immunology and Inflammation, Imperial College London, London W12 0NN, UK
| | - James C. Lee
- Genetic Mechanisms of Disease Laboratory, The Francis Crick Institute, London NW1 1AT, UK
- Institute of Liver and Digestive Health, Royal Free Hospital, University College London, London NW3 2PF, UK
| |
Collapse
|
191
|
Reiter F, de Almeida BP, Stark A. Enhancers display constrained sequence flexibility and context-specific modulation of motif function. Genome Res 2023; 33:346-358. [PMID: 36941077 PMCID: PMC10078294 DOI: 10.1101/gr.277246.122] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 02/14/2023] [Indexed: 03/23/2023]
Abstract
The information about when and where each gene is to be expressed is mainly encoded in the DNA sequence of enhancers, sequence elements that comprise binding sites (motifs) for different transcription factors (TFs). Most of the research on enhancer sequences has been focused on TF motif presence, whereas the enhancer syntax, that is, the flexibility of important motif positions and how the sequence context modulates the activity of TF motifs, remains poorly understood. Here, we explore the rules of enhancer syntax by a two-pronged approach in Drosophila melanogaster S2 cells: we (1) replace important TF motifs by all possible 65,536 eight-nucleotide-long sequences and (2) paste eight important TF motif types into 763 positions within 496 enhancers. These complementary strategies reveal that enhancers display constrained sequence flexibility and the context-specific modulation of motif function. Important motifs can be functionally replaced by hundreds of sequences constituting several distinct motif types, but these are only a fraction of all possible sequences and motif types. Moreover, TF motifs contribute with different intrinsic strengths that are strongly modulated by the enhancer sequence context (the flanking sequence, the presence and diversity of other motif types, and the distance between motifs), such that not all motif types can work in all positions. The context-specific modulation of motif function is also a hallmark of human enhancers, as we demonstrate experimentally. Overall, these two general principles of enhancer sequences are important to understand and predict enhancer function during development, evolution, and in disease.
Collapse
Affiliation(s)
- Franziska Reiter
- Research Institute of Molecular Pathology, Vienna BioCenter, Campus-Vienna-BioCenter 1, 1030 Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, 1030 Vienna, Austria
| | - Bernardo P de Almeida
- Research Institute of Molecular Pathology, Vienna BioCenter, Campus-Vienna-BioCenter 1, 1030 Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, 1030 Vienna, Austria
| | - Alexander Stark
- Research Institute of Molecular Pathology, Vienna BioCenter, Campus-Vienna-BioCenter 1, 1030 Vienna, Austria;
- Medical University of Vienna, Vienna BioCenter, 1030 Vienna, Austria
| |
Collapse
|
192
|
Ahmed I, Yang SH, Ogden S, Zhang W, Li Y, Sharrocks AD. eRNA profiling uncovers the enhancer landscape of oesophageal adenocarcinoma and reveals new deregulated pathways. eLife 2023; 12:e80840. [PMID: 36803948 PMCID: PMC9998086 DOI: 10.7554/elife.80840] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 02/20/2023] [Indexed: 02/22/2023] Open
Abstract
Cancer is driven by both genetic and epigenetic changes that impact on gene expression profiles and the resulting tumourigenic phenotype. Enhancers are transcriptional regulatory elements that are key to our understanding of how this rewiring of gene expression is achieved in cancer cells. Here, we have harnessed the power of RNA-seq data from hundreds of patients with oesophageal adenocarcinoma (OAC) or its precursor state Barrett's oesophagus coupled with open chromatin maps to identify potential enhancer RNAs and their associated enhancer regions in this cancer. We identify ~1000 OAC-specific enhancers and use these data to uncover new cellular pathways that are operational in OAC. Among these are enhancers for JUP, MYBL2, and CCNE1, and we show that their activity is required for cancer cell viability. We also demonstrate the clinical utility of our dataset for identifying disease stage and patient prognosis. Our data therefore identify an important set of regulatory elements that enhance our molecular understanding of OAC and point to potential new therapeutic directions.
Collapse
Affiliation(s)
- Ibrahim Ahmed
- School of Biological Sciences, Faculty of Biology, Medicine and Health, University of ManchesterManchesterUnited Kingdom
| | - Shen-Hsi Yang
- School of Biological Sciences, Faculty of Biology, Medicine and Health, University of ManchesterManchesterUnited Kingdom
| | - Samuel Ogden
- School of Biological Sciences, Faculty of Biology, Medicine and Health, University of ManchesterManchesterUnited Kingdom
| | - Wei Zhang
- School of Biological Sciences, Faculty of Biology, Medicine and Health, University of ManchesterManchesterUnited Kingdom
| | - Yaoyong Li
- School of Biological Sciences, Faculty of Biology, Medicine and Health, University of ManchesterManchesterUnited Kingdom
| | | | - Andrew D Sharrocks
- School of Biological Sciences, Faculty of Biology, Medicine and Health, University of ManchesterManchesterUnited Kingdom
| |
Collapse
|
193
|
Gallego Romero I, Lea AJ. Leveraging massively parallel reporter assays for evolutionary questions. Genome Biol 2023; 24:26. [PMID: 36788564 PMCID: PMC9926830 DOI: 10.1186/s13059-023-02856-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 01/17/2023] [Indexed: 02/16/2023] Open
Abstract
A long-standing goal of evolutionary biology is to decode how gene regulation contributes to organismal diversity. Doing so is challenging because it is hard to predict function from non-coding sequence and to perform molecular research with non-model taxa. Massively parallel reporter assays (MPRAs) enable the testing of thousands to millions of sequences for regulatory activity simultaneously. Here, we discuss the execution, advantages, and limitations of MPRAs, with a focus on evolutionary questions. We propose solutions for extending MPRAs to rare taxa and those with limited genomic resources, and we underscore MPRA's broad potential for driving genome-scale, functional studies across organisms.
Collapse
Affiliation(s)
- Irene Gallego Romero
- Melbourne Integrative Genomics, University of Melbourne, Royal Parade, Parkville, Victoria, 3010, Australia. .,School of BioSciences, The University of Melbourne, Royal Parade, Parkville, 3010, Australia. .,The Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, 30 Royal Parade, Parkville, Victoria, 3010, Australia. .,Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Estonia.
| | - Amanda J. Lea
- grid.152326.10000 0001 2264 7217Department of Biological Sciences, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37240 USA ,Child and Brain Development Program, Canadian Institute for Advanced Study, Toronto, Canada
| |
Collapse
|
194
|
Kim S, Wysocka J. Deciphering the multi-scale, quantitative cis-regulatory code. Mol Cell 2023; 83:373-392. [PMID: 36693380 PMCID: PMC9898153 DOI: 10.1016/j.molcel.2022.12.032] [Citation(s) in RCA: 110] [Impact Index Per Article: 55.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 12/29/2022] [Accepted: 12/30/2022] [Indexed: 01/24/2023]
Abstract
Uncovering the cis-regulatory code that governs when and how much each gene is transcribed in a given genome and cellular state remains a central goal of biology. Here, we discuss major layers of regulation that influence how transcriptional outputs are encoded by DNA sequence and cellular context. We first discuss how transcription factors bind specific DNA sequences in a dosage-dependent and cooperative manner and then proceed to the cofactors that facilitate transcription factor function and mediate the activity of modular cis-regulatory elements such as enhancers, silencers, and promoters. We then consider the complex and poorly understood interplay of these diverse elements within regulatory landscapes and its relationships with chromatin states and nuclear organization. We propose that a mechanistically informed, quantitative model of transcriptional regulation that integrates these multiple regulatory layers will be the key to ultimately cracking the cis-regulatory code.
Collapse
Affiliation(s)
- Seungsoo Kim
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Joanna Wysocka
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
195
|
Klaus L, de Almeida BP, Vlasova A, Nemčko F, Schleiffer A, Bergauer K, Hofbauer L, Rath M, Stark A. Systematic identification and characterization of repressive domains in Drosophila transcription factors. EMBO J 2023; 42:e112100. [PMID: 36545802 PMCID: PMC9890238 DOI: 10.15252/embj.2022112100] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/21/2022] [Accepted: 12/01/2022] [Indexed: 12/24/2022] Open
Abstract
All multicellular life relies on differential gene expression, determined by regulatory DNA elements and DNA-binding transcription factors that mediate activation and repression via cofactor recruitment. While activators have been extensively characterized, repressors are less well studied: the identities and properties of their repressive domains (RDs) are typically unknown and the specific co-repressors (CoRs) they recruit have not been determined. Here, we develop a high-throughput, next-generation sequencing-based screening method, repressive-domain (RD)-seq, to systematically identify RDs in complex DNA-fragment libraries. Screening more than 200,000 fragments covering the coding sequences of all transcription-related proteins in Drosophila melanogaster, we identify 195 RDs in known repressors and in proteins not previously associated with repression. Many RDs contain recurrent short peptide motifs, which are conserved between fly and human and are required for RD function, as demonstrated by motif mutagenesis. Moreover, we show that RDs that contain one of five distinct repressive motifs interact with and depend on different CoRs, such as Groucho, CtBP, Sin3A, or Smrter. These findings advance our understanding of repressors, their sequences, and the functional impact of sequence-altering mutations and should provide a valuable resource for further studies.
Collapse
Affiliation(s)
- Loni Klaus
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
- Vienna BioCenter PhD ProgramDoctoral School of the University of Vienna and Medical University of ViennaViennaAustria
| | - Bernardo P de Almeida
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
- Vienna BioCenter PhD ProgramDoctoral School of the University of Vienna and Medical University of ViennaViennaAustria
| | - Anna Vlasova
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
| | - Filip Nemčko
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
- Vienna BioCenter PhD ProgramDoctoral School of the University of Vienna and Medical University of ViennaViennaAustria
| | - Alexander Schleiffer
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
- Institute of Molecular Biotechnology (IMBA)Vienna BioCenter (VBC)ViennaAustria
| | - Katharina Bergauer
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
| | - Lorena Hofbauer
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
- Vienna BioCenter PhD ProgramDoctoral School of the University of Vienna and Medical University of ViennaViennaAustria
| | - Martina Rath
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)ViennaAustria
- Medical University of ViennaVienna BioCenter (VBC)ViennaAustria
| |
Collapse
|
196
|
Zhao S, Hong CKY, Myers CA, Granas DM, White MA, Corbo JC, Cohen BA. A single-cell massively parallel reporter assay detects cell-type-specific gene regulation. Nat Genet 2023; 55:346-354. [PMID: 36635387 PMCID: PMC9931678 DOI: 10.1038/s41588-022-01278-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 12/05/2022] [Indexed: 01/14/2023]
Abstract
Massively parallel reporter gene assays are key tools in regulatory genomics but cannot be used to identify cell-type-specific regulatory elements without performing assays serially across different cell types. To address this problem, we developed a single-cell massively parallel reporter assay (scMPRA) to measure the activity of libraries of cis-regulatory sequences (CRSs) across multiple cell types simultaneously. We assayed a library of core promoters in a mixture of HEK293 and K562 cells and showed that scMPRA is a reproducible, highly parallel, single-cell reporter gene assay that detects cell-type-specific cis-regulatory activity. We then measured a library of promoter variants across multiple cell types in live mouse retinas and showed that subtle genetic variants can produce cell-type-specific effects on cis-regulatory activity. We anticipate that scMPRA will be widely applicable for studying the role of CRSs across diverse cell types.
Collapse
Affiliation(s)
- Siqi Zhao
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- Ginkgo Bioworks, Boston, MA, USA
| | - Clarice K Y Hong
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Connie A Myers
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - David M Granas
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Michael A White
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Joseph C Corbo
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - Barak A Cohen
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
197
|
Marand AP. Computational Analysis of Maize Enhancer Regulatory Elements Using ATAC-STARR-seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.20.524917. [PMID: 36711646 PMCID: PMC9882361 DOI: 10.1101/2023.01.20.524917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The blueprints to development, response to the environment, and cellular function are largely the manifestation of distinct gene expression programs controlled by the spatiotemporal activity of cis-regulatory elements. Although biochemical methods for identifying accessible chromatin - a hallmark of active cis-regulatory elements - have been developed, approaches capable of measuring and quantifying cis-regulatory activity are only beginning to be realized. Massively Parallel Reporter Assays coupled to chromatin accessibility profiling present a high-throughput solution for testing the transcription-activating capacity of millions of putatively regulatory DNA sequences in parallel. However, clear computational pipelines for analyzing these high-throughput sequencing-based reporter assays are lacking. In this protocol, I layout and rationalize a computational framework for the processing and analysis of Assay for Transposase Accessible Chromatin profiling followed by Self-Transcribed Active Regulatory Region sequencing (ATAC-STARR-seq) data from a recent study in Zea mays. The approach described herein can be adapted to other sequencing-based reporter assays and is largely agnostic to the model organism with the appropriate input substitutions.
Collapse
|
198
|
Lindhorst D, Halfon MS. Reporter gene assays and chromatin-level assays define substantially non-overlapping sets of enhancer sequences. BMC Genomics 2023; 24:17. [PMID: 36639739 PMCID: PMC9837977 DOI: 10.1186/s12864-023-09123-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 01/09/2023] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Transcriptional enhancers are essential for gene regulation, but how these regulatory elements are best defined remains a significant unresolved question. Traditional definitions rely on activity-based criteria such as reporter gene assays, while more recently, biochemical assays based on chromatin-level phenomena such as chromatin accessibility, histone modifications, and localized RNA transcription have gained prominence. RESULTS We examine here whether these two types of definitions, activity-based and chromatin-based, effectively identify the same sets of sequences. We find that, concerningly, the overlap between the two groups is strikingly limited. Few of the data sets we compared displayed statistically significant overlap, and even for those, the degree of overlap was typically small (below 40% of sequences). Moreover, a substantial batch effect was observed in which experiment set rather than experimental method was a primary driver of whether or not chromatin-defined enhancers showed a strong overlap with reporter gene-defined enhancers. CONCLUSIONS Our results raise important questions as to the appropriateness of both old and new enhancer definitions, and suggest that new approaches are required to reconcile the poor agreement among existing methods for defining enhancers.
Collapse
Affiliation(s)
- Daniel Lindhorst
- grid.273335.30000 0004 1936 9887Department of Biochemistry, University at Buffalo-State University of New York, 955 Main St. #5128, Buffalo, NY 14203 USA ,grid.21729.3f0000000419368729Present Address: Program in Biomedical Sciences, Columbia University, New York, NY 10032 USA
| | - Marc S. Halfon
- grid.273335.30000 0004 1936 9887Department of Biochemistry, University at Buffalo-State University of New York, 955 Main St. #5128, Buffalo, NY 14203 USA ,grid.273335.30000 0004 1936 9887Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203 USA ,grid.273335.30000 0004 1936 9887Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14260 USA ,NY State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, NY 14203 USA ,grid.240614.50000 0001 2181 8635Department of Molecular and Cellular Biology and Program in Cancer Genetics, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263 USA
| |
Collapse
|
199
|
Mouri K, Dewey HB, Castro R, Berenzy D, Kales S, Tewhey R. Whole-genome functional characterization of RE1 silencers using a modified massively parallel reporter assay. CELL GENOMICS 2023; 3:100234. [PMID: 36777181 PMCID: PMC9903721 DOI: 10.1016/j.xgen.2022.100234] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 09/12/2022] [Accepted: 11/23/2022] [Indexed: 12/23/2022]
Abstract
Both upregulation and downregulation by cis-regulatory elements help modulate precise gene expression. However, our understanding of repressive elements is far more limited than activating elements. To address this gap, we characterized RE1, a group of transcriptional silencers bound by REST, at genome-wide scale using a modified massively parallel reporter assay (MPRAduo). MPRAduo empirically defined a minimal binding strength of REST (REST motif-intrinsic value [m-value]), above which cofactors colocalize and silence transcription. We identified 1,500 human variants that alter RE1 silencing and found that their effect sizes are predictable when they overlap with REST-binding sites above the m-value. Additionally, we demonstrate that non-canonical REST-binding motifs exhibit silencer function only if they precisely align half sites with specific spacer lengths. Our results show mechanistic insights into RE1, which allow us to predict its activity and effect of variants on RE1, providing a paradigm for performing genome-wide functional characterization of transcription-factor-binding sites.
Collapse
Affiliation(s)
| | | | | | | | - Susan Kales
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
- Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA, USA
| |
Collapse
|
200
|
Аpplication of massive parallel reporter analysis in biotechnology and medicine. КЛИНИЧЕСКАЯ ПРАКТИКА 2023. [DOI: 10.17816/clinpract115063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The development and functioning of an organism relies on tissue-specific gene programs. Genome regulatory elements play a key role in the regulation of such programs, and disruptions in their function can lead to the development of various pathologies, including cancers, malformations and autoimmune diseases. The emergence of high-throughput genomic studies has led to massively parallel reporter analysis (MPRA) methods, which allow the functional verification and identification of regulatory elements on a genome-wide scale. Initially MPRA was used as a tool to investigate fundamental aspects of epigenetics, but the approach also has great potential for clinical and practical biotechnology. Currently, MPRA is used for validation of clinically significant mutations, identification of tissue-specific regulatory elements, search for the most promising loci for transgene integration, and is an indispensable tool for creating highly efficient expression systems, the range of application of which extends from approaches for protein development and design of next-generation therapeutic antibody superproducers to gene therapy. In this review, the main principles and areas of practical application of high-throughput reporter assays will be discussed.
Collapse
|