1
|
Siraj L, Castro RI, Dewey H, Kales S, Nguyen TTL, Kanai M, Berenzy D, Mouri K, Wang QS, McCaw ZR, Gosai SJ, Aguet F, Cui R, Vockley CM, Lareau CA, Okada Y, Gusev A, Jones TR, Lander ES, Sabeti PC, Finucane HK, Reilly SK, Ulirsch JC, Tewhey R. Functional dissection of complex and molecular trait variants at single nucleotide resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.05.592437. [PMID: 38766054 PMCID: PMC11100724 DOI: 10.1101/2024.05.05.592437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Identifying the causal variants and mechanisms that drive complex traits and diseases remains a core problem in human genetics. The majority of these variants have individually weak effects and lie in non-coding gene-regulatory elements where we lack a complete understanding of how single nucleotide alterations modulate transcriptional processes to affect human phenotypes. To address this, we measured the activity of 221,412 trait-associated variants that had been statistically fine-mapped using a Massively Parallel Reporter Assay (MPRA) in 5 diverse cell-types. We show that MPRA is able to discriminate between likely causal variants and controls, identifying 12,025 regulatory variants with high precision. Although the effects of these variants largely agree with orthogonal measures of function, only 69% can plausibly be explained by the disruption of a known transcription factor (TF) binding motif. We dissect the mechanisms of 136 variants using saturation mutagenesis and assign impacted TFs for 91% of variants without a clear canonical mechanism. Finally, we provide evidence that epistasis is prevalent for variants in close proximity and identify multiple functional variants on the same haplotype at a small, but important, subset of trait-associated loci. Overall, our study provides a systematic functional characterization of likely causal common variants underlying complex and molecular human traits, enabling new insights into the regulatory grammar underlying disease risk.
Collapse
Affiliation(s)
- Layla Siraj
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Program in Biophysics, Harvard Graduate School of Arts and Sciences, Boston, MA, USA
- Harvard-Massachusetts Institute of Technology MD/PhD Program, Harvard Medical School, Boston, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | | | | | | | | | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA
| | | | | | - Qingbo S. Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
| | | | - Sager J. Gosai
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - François Aguet
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Ran Cui
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Caleb A. Lareau
- Program in Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| | - Thouis R. Jones
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eric S. Lander
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Pardis C. Sabeti
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Hilary K. Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - Steven K. Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Wu Tsai Institute, Yale University, New Haven, CT, USA
| | - Jacob C. Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
- Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA, USA
| |
Collapse
|
2
|
He AY, Danko CG. Dissection of core promoter syntax through single nucleotide resolution modeling of transcription initiation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.13.583868. [PMID: 38559255 PMCID: PMC10979970 DOI: 10.1101/2024.03.13.583868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Our understanding of how the DNA sequences of cis-regulatory elements encode transcription initiation patterns remains limited. Here we introduce CLIPNET, a deep learning model trained on population-scale PRO-cap data that accurately predicts the position and quantity of transcription initiation with single nucleotide resolution from DNA sequence. Interpretation of CLIPNET revealed a complex regulatory syntax consisting of DNA-protein interactions in five major positions between -200 and +50 bp relative to the transcription start site, as well as more subtle positional preferences among different transcriptional activators. Transcriptional activator and core promoter motifs occupy different positions and play distinct roles in regulating initiation, with the former driving initiation quantity and the latter initiation position. We identified core promoter motifs that explain initiation patterns in the majority of promoters and enhancers, including DPR motifs and AT-rich TBP binding sequences in TATA-less promoters. Our results provide insights into the sequence architecture governing transcription initiation.
Collapse
Affiliation(s)
- Adam Y. He
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University
- Graduate Field of Computational Biology, Cornell University
| | - Charles G. Danko
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University
| |
Collapse
|
3
|
Kwak IY, Kim BC, Lee J, Kang T, Garry DJ, Zhang J, Gong W. Proformer: a hybrid macaron transformer model predicts expression values from promoter sequences. BMC Bioinformatics 2024; 25:81. [PMID: 38378442 PMCID: PMC10877777 DOI: 10.1186/s12859-024-05645-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 01/08/2024] [Indexed: 02/22/2024] Open
Abstract
The breakthrough high-throughput measurement of the cis-regulatory activity of millions of randomly generated promoters provides an unprecedented opportunity to systematically decode the cis-regulatory logic that determines the expression values. We developed an end-to-end transformer encoder architecture named Proformer to predict the expression values from DNA sequences. Proformer used a Macaron-like Transformer encoder architecture, where two half-step feed forward (FFN) layers were placed at the beginning and the end of each encoder block, and a separable 1D convolution layer was inserted after the first FFN layer and in front of the multi-head attention layer. The sliding k-mers from one-hot encoded sequences were mapped onto a continuous embedding, combined with the learned positional embedding and strand embedding (forward strand vs. reverse complemented strand) as the sequence input. Moreover, Proformer introduced multiple expression heads with mask filling to prevent the transformer models from collapsing when training on relatively small amount of data. We empirically determined that this design had significantly better performance than the conventional design such as using the global pooling layer as the output layer for the regression task. These analyses support the notion that Proformer provides a novel method of learning and enhances our understanding of how cis-regulatory sequences determine the expression values.
Collapse
Affiliation(s)
- Il-Youp Kwak
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Byeong-Chan Kim
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Juhyun Lee
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Taein Kang
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Daniel J Garry
- Cardiovascular Division, Department of Medicine, Lillehei Heart Institute, University of Minnesota, 2231 6th St SE, Minneapolis, MN, 55455, USA.
- Stem Cell Institute, University of Minnesota, Minneapolis, MN, 55455, USA.
- Paul and Sheila Wellstone Muscular Dystrophy Center, University of Minnesota, Minneapolis, MN, 55455, USA.
| | - Jianyi Zhang
- Department of Biomedical Engineering, The University of Alabama at Birmingham, Birmingham, AL, 35233, USA
| | - Wuming Gong
- Cardiovascular Division, Department of Medicine, Lillehei Heart Institute, University of Minnesota, 2231 6th St SE, Minneapolis, MN, 55455, USA.
| |
Collapse
|
4
|
de Almeida BP, Schaub C, Pagani M, Secchia S, Furlong EEM, Stark A. Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo. Nature 2024; 626:207-211. [PMID: 38086418 PMCID: PMC10830412 DOI: 10.1038/s41586-023-06905-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 11/28/2023] [Indexed: 01/19/2024]
Abstract
Enhancers control gene expression and have crucial roles in development and homeostasis1-3. However, the targeted de novo design of enhancers with tissue-specific activities has remained challenging. Here we combine deep learning and transfer learning to design tissue-specific enhancers for five tissues in the Drosophila melanogaster embryo: the central nervous system, epidermis, gut, muscle and brain. We first train convolutional neural networks using genome-wide single-cell assay for transposase-accessible chromatin with sequencing (ATAC-seq) datasets and then fine-tune the convolutional neural networks with smaller-scale data from in vivo enhancer activity assays, yielding models with 13% to 76% positive predictive value according to cross-validation. We designed and experimentally assessed 40 synthetic enhancers (8 per tissue) in vivo, of which 31 (78%) were active and 27 (68%) functioned in the target tissue (100% for central nervous system and muscle). The strategy of combining genome-wide and small-scale functional datasets by transfer learning is generally applicable and should enable the design of tissue-, cell type- and cell state-specific enhancers in any system.
Collapse
Affiliation(s)
- Bernardo P de Almeida
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, Austria
- InstaDeep, Paris, France
| | - Christoph Schaub
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Michaela Pagani
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria
| | - Stefano Secchia
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria.
- Medical University of Vienna, Vienna BioCenter (VBC), Vienna, Austria.
| |
Collapse
|
5
|
Kang CK, Kim AR. Deep molecular learning of transcriptional control of a synthetic CRE enhancer and its variants. iScience 2024; 27:108747. [PMID: 38222110 PMCID: PMC10784702 DOI: 10.1016/j.isci.2023.108747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/29/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024] Open
Abstract
Massively parallel reporter assay measures transcriptional activities of various cis-regulatory modules (CRMs) in a single experiment. We developed a thermodynamic computational model framework that calculates quantitative levels of gene expression directly from regulatory DNA sequences. Using the framework, we investigated the molecular mechanisms of cis-regulatory mutations of a synthetic enhancer that cause abnormal gene expression. We found that, in a human cell line, competitive binding between family transcription factors (TFs) with slightly different binding preferences significantly increases the accuracy of recapitulating the transcriptional effects of thousands of single- or multi-mutations. We also discovered that even if various harmful mutations occurred in an activator binding site, CRM could stably maintain or even increase gene expression through a certain form of competitive binding between family TFs. These findings enhance understanding the effect of SNPs and indels on CRMs and would help building robust custom-designed CRMs for biologics production and gene therapy.
Collapse
Affiliation(s)
- Chan-Koo Kang
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- School of Applied Artificial Intelligence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| |
Collapse
|
6
|
Loell KJ, Friedman RZ, Myers CA, Corbo JC, Cohen BA, White MA. Transcription factor interactions explain the context-dependent activity of CRX binding sites. PLoS Comput Biol 2024; 20:e1011802. [PMID: 38227575 PMCID: PMC10817189 DOI: 10.1371/journal.pcbi.1011802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 01/26/2024] [Accepted: 01/06/2024] [Indexed: 01/18/2024] Open
Abstract
The effects of transcription factor binding sites (TFBSs) on the activity of a cis-regulatory element (CRE) depend on the local sequence context. In rod photoreceptors, binding sites for the transcription factor (TF) Cone-rod homeobox (CRX) occur in both enhancers and silencers, but the sequence context that determines whether CRX binding sites contribute to activation or repression of transcription is not understood. To investigate the context-dependent activity of CRX sites, we fit neural network-based models to the activities of synthetic CREs composed of photoreceptor TFBSs. The models revealed that CRX binding sites consistently make positive, independent contributions to CRE activity, while negative homotypic interactions between sites cause CREs composed of multiple CRX sites to function as silencers. The effects of negative homotypic interactions can be overcome by the presence of other TFBSs that either interact cooperatively with CRX sites or make independent positive contributions to activity. The context-dependent activity of CRX sites is thus determined by the balance between positive heterotypic interactions, independent contributions of TFBSs, and negative homotypic interactions. Our findings explain observed patterns of activity among genomic CRX-bound enhancers and silencers, and suggest that enhancers may require diverse TFBSs to overcome negative homotypic interactions between TFBSs.
Collapse
Affiliation(s)
- Kaiser J. Loell
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Ryan Z. Friedman
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Connie A. Myers
- Department of Pathology and Immunology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Joseph C. Corbo
- Department of Pathology and Immunology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Barak A. Cohen
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| | - Michael A. White
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, Missouri, United States of America
| |
Collapse
|
7
|
Martyn GE, Montgomery MT, Jones H, Guo K, Doughty BR, Linder J, Chen Z, Cochran K, Lawrence KA, Munson G, Pampari A, Fulco CP, Kelley DR, Lander ES, Kundaje A, Engreitz JM. Rewriting regulatory DNA to dissect and reprogram gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.20.572268. [PMID: 38187584 PMCID: PMC10769263 DOI: 10.1101/2023.12.20.572268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Regulatory DNA sequences within enhancers and promoters bind transcription factors to encode cell type-specific patterns of gene expression. However, the regulatory effects and programmability of such DNA sequences remain difficult to map or predict because we have lacked scalable methods to precisely edit regulatory DNA and quantify the effects in an endogenous genomic context. Here we present an approach to measure the quantitative effects of hundreds of designed DNA sequence variants on gene expression, by combining pooled CRISPR prime editing with RNA fluorescence in situ hybridization and cell sorting (Variant-FlowFISH). We apply this method to mutagenize and rewrite regulatory DNA sequences in an enhancer and the promoter of PPIF in two immune cell lines. Of 672 variant-cell type pairs, we identify 497 that affect PPIF expression. These variants appear to act through a variety of mechanisms including disruption or optimization of existing transcription factor binding sites, as well as creation of de novo sites. Disrupting a single endogenous transcription factor binding site often led to large changes in expression (up to -40% in the enhancer, and -50% in the promoter). The same variant often had different effects across cell types and states, demonstrating a highly tunable regulatory landscape. We use these data to benchmark performance of sequence-based predictive models of gene regulation, and find that certain types of variants are not accurately predicted by existing models. Finally, we computationally design 185 small sequence variants (≤10 bp) and optimize them for specific effects on expression in silico. 84% of these rationally designed edits showed the intended direction of effect, and some had dramatic effects on expression (-100% to +202%). Variant-FlowFISH thus provides a powerful tool to map the effects of variants and transcription factor binding sites on gene expression, test and improve computational models of gene regulation, and reprogram regulatory DNA.
Collapse
Affiliation(s)
- Gabriella E Martyn
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
| | - Michael T Montgomery
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
| | - Hank Jones
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
| | - Katherine Guo
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
| | - Benjamin R Doughty
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Ziwei Chen
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Kelly Cochran
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Kathryn A Lawrence
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Glen Munson
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Anusri Pampari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Charles P Fulco
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Present Address: Sanofi, Cambridge, MA, USA
| | | | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jesse M Engreitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanford Cardiovascular Institute, Stanford University, Stanford, CA, USA
| |
Collapse
|
8
|
Cai YM, Witham S, Patron NJ. Tuning Plant Promoters Using a Simple Split Luciferase Method to Assess Transcription Factor-DNA Interactions. ACS Synth Biol 2023; 12:3482-3486. [PMID: 37856867 PMCID: PMC10661027 DOI: 10.1021/acssynbio.3c00094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Indexed: 10/21/2023]
Abstract
Sequence features, including the affinity of binding motifs for their cognate transcription factors, are important contributors to promoter behavior. The ability to predictably recode affinity enables the development of synthetic promoters with varying levels of response to known cellular signals. Here we describe a luminescence-based microplate assay for comparing the interactions of transcription factors with short DNA probes. We then demonstrate how these data can be used to design synthetic plant promoters of varying strengths that respond to the same transcription factor.
Collapse
Affiliation(s)
- Y.-M. Cai
- Engineering
Biology, Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, U.K.
| | - S. Witham
- Engineering
Biology, Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, U.K.
| | - N. J. Patron
- Engineering
Biology, Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, U.K.
| |
Collapse
|
9
|
Zhang P, Wang H, Xu H, Wei L, Liu L, Hu Z, Wang X. Deep flanking sequence engineering for efficient promoter design using DeepSEED. Nat Commun 2023; 14:6309. [PMID: 37813854 PMCID: PMC10562447 DOI: 10.1038/s41467-023-41899-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 09/20/2023] [Indexed: 10/11/2023] Open
Abstract
Designing promoters with desirable properties is essential in synthetic biology. Human experts are skilled at identifying strong explicit patterns in small samples, while deep learning models excel at detecting implicit weak patterns in large datasets. Biologists have described the sequence patterns of promoters via transcription factor binding sites (TFBSs). However, the flanking sequences of cis-regulatory elements, have long been overlooked and often arbitrarily decided in promoter design. To address this limitation, we introduce DeepSEED, an AI-aided framework that efficiently designs synthetic promoters by combining expert knowledge with deep learning techniques. DeepSEED has demonstrated success in improving the properties of Escherichia coli constitutive, IPTG-inducible, and mammalian cell doxycycline (Dox)-inducible promoters. Furthermore, our results show that DeepSEED captures the implicit features in flanking sequences, such as k-mer frequencies and DNA shape features, which are crucial for determining promoter properties.
Collapse
Affiliation(s)
- Pengcheng Zhang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, China
| | - Haochen Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, China
| | - Hanwen Xu
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, China
| | - Lei Wei
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, China
| | - Liyang Liu
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, China
| | - Zhirui Hu
- Center for Statistical Science, Tsinghua University, Beijing, China
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, China.
| |
Collapse
|
10
|
Liu Y, Wang Z, Yuan H, Zhu G, Zhang Y. HEAP: a task adaptive-based explainable deep learning framework for enhancer activity prediction. Brief Bioinform 2023; 24:bbad286. [PMID: 37539835 DOI: 10.1093/bib/bbad286] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 07/05/2023] [Accepted: 07/21/2023] [Indexed: 08/05/2023] Open
Abstract
Enhancers are crucial cis-regulatory elements that control gene expression in a cell-type-specific manner. Despite extensive genetic and computational studies, accurately predicting enhancer activity in different cell types remains a challenge, and the grammar of enhancers is still poorly understood. Here, we present HEAP (high-resolution enhancer activity prediction), an explainable deep learning framework for predicting enhancers and exploring enhancer grammar. The framework includes three modules that use grammar-based reasoning for enhancer prediction. The algorithm can incorporate DNA sequences and epigenetic modifications to obtain better accuracy. We use a novel two-step multi-task learning method, task adaptive parameter sharing (TAPS), to efficiently predict enhancers in different cell types. We first train a shared model with all cell-type datasets. Then we adapt to specific tasks by adding several task-specific subset layers. Experiments demonstrate that HEAP outperforms published methods and showcases the effectiveness of the TAPS, especially for those with limited training samples. Notably, the explainable framework HEAP utilizes post-hoc interpretation to provide insights into the prediction mechanisms from three perspectives: data, model architecture and algorithm, leading to a better understanding of model decisions and enhancer grammar. To the best of our knowledge, HEAP will be a valuable tool for insight into the complex mechanisms of enhancer activity.
Collapse
Affiliation(s)
- Yuhang Liu
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Zixuan Wang
- College of Electronics and Information Engieering, Sichuan University, 610065, Chengdu, China
| | - Hao Yuan
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Guiquan Zhu
- West China Hospital of Stomatology, Sichuan University, 610041, Chengdu, China
| | - Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| |
Collapse
|
11
|
Kleinschmidt H, Xu C, Bai L. Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 2023; 132:167-189. [PMID: 37184694 PMCID: PMC10542970 DOI: 10.1007/s00412-023-00796-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/16/2023]
Abstract
Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.
Collapse
Affiliation(s)
- Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Physics, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
12
|
Friedman RZ, Ramu A, Lichtarge S, Myers CA, Granas DM, Gause M, Corbo JC, Cohen BA, White MA. Active learning of enhancer and silencer regulatory grammar in photoreceptors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.21.554146. [PMID: 37662358 PMCID: PMC10473580 DOI: 10.1101/2023.08.21.554146] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Cis-regulatory elements (CREs) direct gene expression in health and disease, and models that can accurately predict their activities from DNA sequences are crucial for biomedicine. Deep learning represents one emerging strategy to model the regulatory grammar that relates CRE sequence to function. However, these models require training data on a scale that exceeds the number of CREs in the genome. We address this problem using active machine learning to iteratively train models on multiple rounds of synthetic DNA sequences assayed in live mammalian retinas. During each round of training the model actively selects sequence perturbations to assay, thereby efficiently generating informative training data. We iteratively trained a model that predicts the activities of sequences containing binding motifs for the photoreceptor transcription factor Cone-rod homeobox (CRX) using an order of magnitude less training data than current approaches. The model's internal confidence estimates of its predictions are reliable guides for designing sequences with high activity. The model correctly identified critical sequence differences between active and inactive sequences with nearly identical transcription factor binding sites, and revealed order and spacing preferences for combinations of motifs. Our results establish active learning as an effective method to train accurate deep learning models of cis-regulatory function after exhausting naturally occurring training examples in the genome.
Collapse
Affiliation(s)
- Ryan Z. Friedman
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Avinash Ramu
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Sara Lichtarge
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Connie A. Myers
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, 63110
| | - David M. Granas
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Maria Gause
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Joseph C. Corbo
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Barak A. Cohen
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Michael A. White
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| |
Collapse
|
13
|
Cain B, Webb J, Yuan Z, Cheung D, Lim HW, Kovall R, Weirauch MT, Gebelein B. Prediction of cooperative homeodomain DNA binding sites from high-throughput-SELEX data. Nucleic Acids Res 2023; 51:6055-6072. [PMID: 37114997 PMCID: PMC10325903 DOI: 10.1093/nar/gkad318] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 04/12/2023] [Accepted: 04/25/2023] [Indexed: 04/29/2023] Open
Abstract
Homeodomain proteins constitute one of the largest families of metazoan transcription factors. Genetic studies have demonstrated that homeodomain proteins regulate many developmental processes. Yet, biochemical data reveal that most bind highly similar DNA sequences. Defining how homeodomain proteins achieve DNA binding specificity has therefore been a long-standing goal. Here, we developed a novel computational approach to predict cooperative dimeric binding of homeodomain proteins using High-Throughput (HT) SELEX data. Importantly, we found that 15 of 88 homeodomain factors form cooperative homodimer complexes on DNA sites with precise spacing requirements. Approximately one third of the paired-like homeodomain proteins cooperatively bind palindromic sequences spaced 3 bp apart, whereas other homeodomain proteins cooperatively bind sites with distinct orientation and spacing requirements. Combining structural models of a paired-like factor with our cooperativity predictions identified key amino acid differences that help differentiate between cooperative and non-cooperative factors. Finally, we confirmed predicted cooperative dimer sites in vivo using available genomic data for a subset of factors. These findings demonstrate how HT-SELEX data can be computationally mined to predict cooperativity. In addition, the binding site spacing requirements of select homeodomain proteins provide a mechanism by which seemingly similar AT-rich DNA sequences can preferentially recruit specific homeodomain factors.
Collapse
Affiliation(s)
- Brittany Cain
- Department of Biomedical Engineering, University of Cincinnati, Cincinnati, OH 45221, USA
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Ave, MLC 7007, Cincinnati, OH 45229, USA
| | - Jordan Webb
- Department of Molecular Genetics, Biochemistry and Microbiology, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA
| | - Zhenyu Yuan
- Department of Molecular Genetics, Biochemistry and Microbiology, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA
| | - David Cheung
- Graduate Program in Molecular and Developmental Biology, Cincinnati Children's Hospital Research Foundation, Cincinnati, OH 45229, USA
| | - Hee-Woong Lim
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - Rhett A Kovall
- Department of Molecular Genetics, Biochemistry and Microbiology, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA
| | - Matthew T Weirauch
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
- Divisions of Human Genetics, Biomedical Informatics and Developmental Biology, Center for Autoimmune Genomics and Etiology (CAGE), Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Brian Gebelein
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Ave, MLC 7007, Cincinnati, OH 45229, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| |
Collapse
|
14
|
Shi FY, Wang Y, Huang D, Liang Y, Liang N, Chen XW, Gao G. Computational Assessment of the Expression-modulating Potential for Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:662-673. [PMID: 34890839 PMCID: PMC10787178 DOI: 10.1016/j.gpb.2021.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 10/13/2021] [Accepted: 11/01/2021] [Indexed: 06/13/2023]
Abstract
Large-scale genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) studies have identified multiple non-coding variants associated with genetic diseases by affecting gene expression. However, pinpointing causal variants effectively and efficiently remains a serious challenge. Here, we developed CARMEN, a novel algorithm to identify functional non-coding expression-modulating variants. Multiple evaluations demonstrated CARMEN's superior performance over state-of-the-art tools. Applying CARMEN to GWAS and eQTL datasets further pinpointed several causal variants other than the reported lead single-nucleotide polymorphisms (SNPs). CARMEN scales well with the massive datasets, and is available online as a web server at http://carmen.gao-lab.org.
Collapse
Affiliation(s)
- Fang-Yuan Shi
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Yu Wang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Dong Huang
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China
| | - Yu Liang
- Human Aging Research Institute, School of Life Science, Nanchang University, Nanchang 330031, China
| | - Nan Liang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Xiao-Wei Chen
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China.
| |
Collapse
|
15
|
Georgakopoulos-Soares I, Deng C, Agarwal V, Chan CSY, Zhao J, Inoue F, Ahituv N. Transcription factor binding site orientation and order are major drivers of gene regulatory activity. Nat Commun 2023; 14:2333. [PMID: 37087538 PMCID: PMC10122648 DOI: 10.1038/s41467-023-37960-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 04/06/2023] [Indexed: 04/24/2023] Open
Abstract
The gene regulatory code and grammar remain largely unknown, precluding our ability to link phenotype to genotype in regulatory sequences. Here, using a massively parallel reporter assay (MPRA) of 209,440 sequences, we examine all possible pair and triplet combinations, permutations and orientations of eighteen liver-associated transcription factor binding sites (TFBS). We find that TFBS orientation and order have a major effect on gene regulatory activity. Corroborating these results with genomic analyses, we find clear human promoter TFBS orientation biases and similar TFBS orientation and order transcriptional effects in an MPRA that tested 164,307 liver candidate regulatory elements. Additionally, by adding TFBS orientation to a model that predicts expression from sequence we improve performance by 7.7%. Collectively, our results show that TFBS orientation and order have a significant effect on gene regulatory activity and need to be considered when analyzing the functional effect of variants on the activity of these sequences.
Collapse
Affiliation(s)
- Ilias Georgakopoulos-Soares
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA.
| | - Chengyu Deng
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Vikram Agarwal
- mRNA Center of Excellence, Sanofi Pasteur Inc., Waltham, MA, USA
| | - Candace S Y Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Jingjing Zhao
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Fumitaka Inoue
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
16
|
Caratti G, Stifel U, Caratti B, Jamil AJM, Chung KJ, Kiehntopf M, Gräler MH, Blüher M, Rauch A, Tuckermann JP. Glucocorticoid activation of anti-inflammatory macrophages protects against insulin resistance. Nat Commun 2023; 14:2271. [PMID: 37080971 PMCID: PMC10119112 DOI: 10.1038/s41467-023-37831-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 04/01/2023] [Indexed: 04/22/2023] Open
Abstract
Insulin resistance (IR) during obesity is linked to adipose tissue macrophage (ATM)-driven inflammation of adipose tissue. Whether anti-inflammatory glucocorticoids (GCs) at physiological levels modulate IR is unclear. Here, we report that deletion of the GC receptor (GR) in myeloid cells, including macrophages in mice, aggravates obesity-related IR by enhancing adipose tissue inflammation due to decreased anti-inflammatory ATM leading to exaggerated adipose tissue lipolysis and severe hepatic steatosis. In contrast, GR deletion in Kupffer cells alone does not alter IR. Co-culture experiments show that the absence of GR in macrophages directly causes reduced phospho-AKT and glucose uptake in adipocytes, suggesting an important function of GR in ATM. GR-deficient macrophages are refractory to alternative ATM-inducing IL-4 signaling, due to reduced STAT6 chromatin loading and diminished anti-inflammatory enhancer activation. We demonstrate that GR has an important function in macrophages during obesity by limiting adipose tissue inflammation and lipolysis to promote insulin sensitivity.
Collapse
Affiliation(s)
- Giorgio Caratti
- Institute of Comparative Molecular Endocrinology, University of Ulm, Ulm, Germany
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford, OX3 9DU, UK
- Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, OX37LE, UK
| | - Ulrich Stifel
- Institute of Comparative Molecular Endocrinology, University of Ulm, Ulm, Germany
| | - Bozhena Caratti
- Institute of Comparative Molecular Endocrinology, University of Ulm, Ulm, Germany
| | - Ali J M Jamil
- Molecular Endocrinology & Stem Cell Research Unit, Department of Endocrinology and Metabolism, Odense University Hospital, Odense, Denmark
- Department of Clinical Research, University of Southern Denmark, Odense, Denmark
| | - Kyoung-Jin Chung
- Institute for Clinical Chemistry and Laboratory Medicine, University Hospital and Faculty of Medicine, Technical University Dresden, Dresden, Germany
| | - Michael Kiehntopf
- SG Sepsis Research Clinic for Anesthesiology and Intensive Care, Jena University Hospital, Jena, Germany
| | - Markus H Gräler
- Department of Anesthesiology and Intensive Care Medicine, Jena University Hospital, Jena, Germany
- Center for Molecular Biomedicine (CMB), Jena University Hospital, Jena, Germany
- Center for Sepsis Control and Care (CSCC), Jena University Hospital, Jena, Germany
| | - Matthias Blüher
- Department of Endocrinology and Nephrology, University of Leipzig, Leipzig, Germany
| | - Alexander Rauch
- Molecular Endocrinology & Stem Cell Research Unit, Department of Endocrinology and Metabolism, Odense University Hospital, Odense, Denmark.
- Department of Clinical Research, University of Southern Denmark, Odense, Denmark.
- Steno Diabetes Center Odense, Odense, Denmark.
| | - Jan P Tuckermann
- Institute of Comparative Molecular Endocrinology, University of Ulm, Ulm, Germany.
| |
Collapse
|
17
|
Zheng Y, VanDusen NJ. Massively Parallel Reporter Assays for High-Throughput In Vivo Analysis of Cis-Regulatory Elements. J Cardiovasc Dev Dis 2023; 10:jcdd10040144. [PMID: 37103023 PMCID: PMC10146671 DOI: 10.3390/jcdd10040144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/24/2023] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
The rapid improvement of descriptive genomic technologies has fueled a dramatic increase in hypothesized connections between cardiovascular gene expression and phenotypes. However, in vivo testing of these hypotheses has predominantly been relegated to slow, expensive, and linear generation of genetically modified mice. In the study of genomic cis-regulatory elements, generation of mice featuring transgenic reporters or cis-regulatory element knockout remains the standard approach. While the data obtained is of high quality, the approach is insufficient to keep pace with candidate identification and therefore results in biases introduced during the selection of candidates for validation. However, recent advances across a range of disciplines are converging to enable functional genomic assays that can be conducted in a high-throughput manner. Here, we review one such method, massively parallel reporter assays (MPRAs), in which the activities of thousands of candidate genomic regulatory elements are simultaneously assessed via the next-generation sequencing of a barcoded reporter transcript. We discuss best practices for MPRA design and use, with a focus on practical considerations, and review how this emerging technology has been successfully deployed in vivo. Finally, we discuss how MPRAs are likely to evolve and be used in future cardiovascular research.
Collapse
|
18
|
Abstract
Efforts to decrease the adverse effects of nuclear receptor (NR) drugs have yielded experimental agonists that produce better outcomes in mice. Some of these agonists have been shown to cause different, not just less intense, on-target transcriptomic effects; however, a structural explanation for such agonist-specific effects remains unknown. Here, we show that partial agonists of the NR peroxisome proliferator-associated receptor γ (PPARγ), which induce better outcomes in mice compared to clinically utilized type II diabetes PPARγ-binding drugs thiazolidinediones (TZDs), also favor a different group of coactivator peptides than the TZDs. We find that PPARγ full agonists can also be biased relative to each other in terms of coactivator peptide binding. We find differences in coactivator-PPARγ bonding between the coactivator subgroups which allow agonists to favor one group of coactivator peptides over another, including differential bonding to a C-terminal residue of helix 4. Analysis of all available NR-coactivator structures indicates that such differential helix 4 bonding persists across other NR-coactivator complexes, providing a general structural mechanism of biased agonism for many NRs. Further work will be necessary to determine if such bias translates into altered coactivator occupancy and physiology in cells.
Collapse
|
19
|
Mansisidor AR, Risca VI. Chromatin accessibility: methods, mechanisms, and biological insights. Nucleus 2022; 13:236-276. [PMID: 36404679 PMCID: PMC9683059 DOI: 10.1080/19491034.2022.2143106] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 10/23/2022] [Accepted: 10/30/2022] [Indexed: 11/22/2022] Open
Abstract
Access to DNA is a prerequisite to the execution of essential cellular processes that include transcription, replication, chromosomal segregation, and DNA repair. How the proteins that regulate these processes function in the context of chromatin and its dynamic architectures is an intensive field of study. Over the past decade, genome-wide assays and new imaging approaches have enabled a greater understanding of how access to the genome is regulated by nucleosomes and associated proteins. Additional mechanisms that may control DNA accessibility in vivo include chromatin compaction and phase separation - processes that are beginning to be understood. Here, we review the ongoing development of accessibility measurements, we summarize the different molecular and structural mechanisms that shape the accessibility landscape, and we detail the many important biological functions that are linked to chromatin accessibility.
Collapse
Affiliation(s)
- Andrés R. Mansisidor
- Laboratory of Genome Architecture and Dynamics, The Rockefeller University, New York, NY
| | - Viviana I. Risca
- Laboratory of Genome Architecture and Dynamics, The Rockefeller University, New York, NY
| |
Collapse
|
20
|
Saha S, Spinelli L, Castro Mondragon JA, Kervadec A, Lynott M, Kremmer L, Roder L, Krifa S, Torres M, Brun C, Vogler G, Bodmer R, Colas AR, Ocorr K, Perrin L. Genetic architecture of natural variation of cardiac performance from flies to humans. eLife 2022; 11:82459. [DOI: 10.7554/elife.82459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 10/25/2022] [Indexed: 11/17/2022] Open
Abstract
Deciphering the genetic architecture of human cardiac disorders is of fundamental importance but their underlying complexity is a major hurdle. We investigated the natural variation of cardiac performance in the sequenced inbred lines of the Drosophila Genetic Reference Panel (DGRP). Genome-wide associations studies (GWAS) identified genetic networks associated with natural variation of cardiac traits which were used to gain insights as to the molecular and cellular processes affected. Non-coding variants that we identified were used to map potential regulatory non-coding regions, which in turn were employed to predict transcription factors (TFs) binding sites. Cognate TFs, many of which themselves bear polymorphisms associated with variations of cardiac performance, were also validated by heart-specific knockdown. Additionally, we showed that the natural variations associated with variability in cardiac performance affect a set of genes overlapping those associated with average traits but through different variants in the same genes. Furthermore, we showed that phenotypic variability was also associated with natural variation of gene regulatory networks. More importantly, we documented correlations between genes associated with cardiac phenotypes in both flies and humans, which supports a conserved genetic architecture regulating adult cardiac function from arthropods to mammals. Specifically, roles for PAX9 and EGR2 in the regulation of the cardiac rhythm were established in both models, illustrating that the characteristics of natural variations in cardiac function identified in Drosophila can accelerate discovery in humans.
Collapse
Affiliation(s)
- Saswati Saha
- Aix-Marseille University, INSERM, TAGC, Turing Center for Living systems
| | - Lionel Spinelli
- Aix-Marseille University, INSERM, TAGC, Turing Center for Living systems
| | | | - Anaïs Kervadec
- Development, Aging and Regeneration Program, Sanford Burnham Prebys Medical Discovery Institute
| | - Michaela Lynott
- Development, Aging and Regeneration Program, Sanford Burnham Prebys Medical Discovery Institute
| | - Laurent Kremmer
- Aix-Marseille University, INSERM, TAGC, Turing Center for Living systems
| | - Laurence Roder
- Aix-Marseille University, INSERM, TAGC, Turing Center for Living systems
| | - Sallouha Krifa
- Aix-Marseille University, INSERM, TAGC, Turing Center for Living systems
| | - Magali Torres
- Aix-Marseille University, INSERM, TAGC, Turing Center for Living systems
| | - Christine Brun
- Aix-Marseille University, INSERM, TAGC, Turing Center for Living systems
- CNRS
| | - Georg Vogler
- Development, Aging and Regeneration Program, Sanford Burnham Prebys Medical Discovery Institute
| | - Rolf Bodmer
- Development, Aging and Regeneration Program, Sanford Burnham Prebys Medical Discovery Institute
| | - Alexandre R Colas
- Development, Aging and Regeneration Program, Sanford Burnham Prebys Medical Discovery Institute
| | - Karen Ocorr
- Development, Aging and Regeneration Program, Sanford Burnham Prebys Medical Discovery Institute
| | - Laurent Perrin
- Aix-Marseille University, INSERM, TAGC, Turing Center for Living systems
- CNRS
| |
Collapse
|
21
|
Chen Y, Cattoglio C, Dailey GM, Zhu Q, Tjian R, Darzacq X. Mechanisms governing target search and binding dynamics of hypoxia-inducible factors. eLife 2022; 11:e75064. [PMID: 36322456 PMCID: PMC9681212 DOI: 10.7554/elife.75064] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 11/01/2022] [Indexed: 11/07/2022] Open
Abstract
Transcription factors (TFs) are classically attributed a modular construction, containing well-structured sequence-specific DNA-binding domains (DBDs) paired with disordered activation domains (ADs) responsible for protein-protein interactions targeting co-factors or the core transcription initiation machinery. However, this simple division of labor model struggles to explain why TFs with identical DNA-binding sequence specificity determined in vitro exhibit distinct binding profiles in vivo. The family of hypoxia-inducible factors (HIFs) offer a stark example: aberrantly expressed in several cancer types, HIF-1α and HIF-2α subunit isoforms recognize the same DNA motif in vitro - the hypoxia response element (HRE) - but only share a subset of their target genes in vivo, while eliciting contrasting effects on cancer development and progression under certain circumstances. To probe the mechanisms mediating isoform-specific gene regulation, we used live-cell single particle tracking (SPT) to investigate HIF nuclear dynamics and how they change upon genetic perturbation or drug treatment. We found that HIF-α subunits and their dimerization partner HIF-1β exhibit distinct diffusion and binding characteristics that are exquisitely sensitive to concentration and subunit stoichiometry. Using domain-swap variants, mutations, and a HIF-2α specific inhibitor, we found that although the DBD and dimerization domains are important, another main determinant of chromatin binding and diffusion behavior is the AD-containing intrinsically disordered region (IDR). Using Cut&Run and RNA-seq as orthogonal genomic approaches, we also confirmed IDR-dependent binding and activation of a specific subset of HIF target genes. These findings reveal a previously unappreciated role of IDRs in regulating the TF search and binding process that contribute to functional target site selectivity on chromatin.
Collapse
Affiliation(s)
- Yu Chen
- Department of Molecular and Cell Biology, University of California, BerkeleyBerkeleyUnited States
- Howard Hughes Medical Institute, University of California, BerkeleyBerkeleyUnited States
- Li Ka Shing Center for Biomedical & Health Sciences, University of California, BerkeleyBerkeleyUnited States
| | - Claudia Cattoglio
- Department of Molecular and Cell Biology, University of California, BerkeleyBerkeleyUnited States
- Howard Hughes Medical Institute, University of California, BerkeleyBerkeleyUnited States
- Li Ka Shing Center for Biomedical & Health Sciences, University of California, BerkeleyBerkeleyUnited States
| | - Gina M Dailey
- Department of Molecular and Cell Biology, University of California, BerkeleyBerkeleyUnited States
- Li Ka Shing Center for Biomedical & Health Sciences, University of California, BerkeleyBerkeleyUnited States
| | - Qiulin Zhu
- Department of Molecular and Cell Biology, University of California, BerkeleyBerkeleyUnited States
- Li Ka Shing Center for Biomedical & Health Sciences, University of California, BerkeleyBerkeleyUnited States
| | - Robert Tjian
- Department of Molecular and Cell Biology, University of California, BerkeleyBerkeleyUnited States
- Howard Hughes Medical Institute, University of California, BerkeleyBerkeleyUnited States
- Li Ka Shing Center for Biomedical & Health Sciences, University of California, BerkeleyBerkeleyUnited States
| | - Xavier Darzacq
- Department of Molecular and Cell Biology, University of California, BerkeleyBerkeleyUnited States
- Li Ka Shing Center for Biomedical & Health Sciences, University of California, BerkeleyBerkeleyUnited States
| |
Collapse
|
22
|
Cooper YA, Guo Q, Geschwind DH. Multiplexed functional genomic assays to decipher the noncoding genome. Hum Mol Genet 2022; 31:R84-R96. [PMID: 36057282 PMCID: PMC9585676 DOI: 10.1093/hmg/ddac194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/14/2022] Open
Abstract
Linkage disequilibrium and the incomplete regulatory annotation of the noncoding genome complicates the identification of functional noncoding genetic variants and their causal association with disease. Current computational methods for variant prioritization have limited predictive value, necessitating the application of highly parallelized experimental assays to efficiently identify functional noncoding variation. Here, we summarize two distinct approaches, massively parallel reporter assays and CRISPR-based pooled screens and describe their flexible implementation to characterize human noncoding genetic variation at unprecedented scale. Each approach provides unique advantages and limitations, highlighting the importance of multimodal methodological integration. These multiplexed assays of variant effects are undoubtedly poised to play a key role in the experimental characterization of noncoding genetic risk, informing our understanding of the underlying mechanisms of disease-associated loci and the development of more robust predictive classification algorithms.
Collapse
Affiliation(s)
- Yonatan A Cooper
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Medical Scientist Training Program, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Qiuyu Guo
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Program in Neurogenetics, Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, University of California Los Angeles, Los Angeles, CA, USA
- Institute of Precision Health, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
23
|
Mikl M, Eletto D, Nijim M, Lee M, Lafzi A, Mhamedi F, David O, Sain SB, Handler K, Moor AE. A massively parallel reporter assay reveals focused and broadly encoded RNA localization signals in neurons. Nucleic Acids Res 2022; 50:10643-10664. [PMID: 36156153 PMCID: PMC9561380 DOI: 10.1093/nar/gkac806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 08/24/2022] [Accepted: 09/08/2022] [Indexed: 11/14/2022] Open
Abstract
Asymmetric subcellular mRNA localization allows spatial regulation of gene expression and functional compartmentalization. In neurons, localization of specific mRNAs to neurites is essential for cellular functioning. However, it is largely unknown how transcript sorting works in a sequence-specific manner. Here, we combined subcellular transcriptomics and massively parallel reporter assays and tested ∼50 000 sequences for their ability to localize to neurites. Mapping the localization potential of >300 genes revealed two ways neurite targeting can be achieved: focused localization motifs and broadly encoded localization potential. We characterized the interplay between RNA stability and localization and identified motifs able to bias localization towards neurite or soma as well as the trans-acting factors required for their action. Based on our data, we devised machine learning models that were able to predict the localization behavior of novel reporter sequences. Testing this predictor on native mRNA sequencing data showed good agreement between predicted and observed localization potential, suggesting that the rules uncovered by our MPRA also apply to the localization of native full-length transcripts.
Collapse
Affiliation(s)
- Martin Mikl
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,Department of Human Biology, University of Haifa, Haifa, Israel
| | - Davide Eletto
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Malak Nijim
- Department of Human Biology, University of Haifa, Haifa, Israel
| | - Minkyoung Lee
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Atefeh Lafzi
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Farah Mhamedi
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Orit David
- Department of Human Biology, University of Haifa, Haifa, Israel
| | - Simona Baghai Sain
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Kristina Handler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Andreas E Moor
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| |
Collapse
|
24
|
Yang MG, Ling E, Cowley CJ, Greenberg ME, Vierbuchen T. Characterization of sequence determinants of enhancer function using natural genetic variation. eLife 2022; 11:76500. [PMID: 36043696 PMCID: PMC9662815 DOI: 10.7554/elife.76500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 08/30/2022] [Indexed: 02/04/2023] Open
Abstract
Sequence variation in enhancers that control cell-type-specific gene transcription contributes significantly to phenotypic variation within human populations. However, it remains difficult to predict precisely the effect of any given sequence variant on enhancer function due to the complexity of DNA sequence motifs that determine transcription factor (TF) binding to enhancers in their native genomic context. Using F1-hybrid cells derived from crosses between distantly related inbred strains of mice, we identified thousands of enhancers with allele-specific TF binding and/or activity. We find that genetic variants located within the central region of enhancers are most likely to alter TF binding and enhancer activity. We observe that the AP-1 family of TFs (Fos/Jun) are frequently required for binding of TEAD TFs and for enhancer function. However, many sequence variants outside of core motifs for AP-1 and TEAD also impact enhancer function, including sequences flanking core TF motifs and AP-1 half sites. Taken together, these data represent one of the most comprehensive assessments of allele-specific TF binding and enhancer function to date and reveal how sequence changes at enhancers alter their function across evolutionary timescales.
Collapse
Affiliation(s)
- Marty G Yang
- Department of Neurobiology, Harvard Medical School, Boston, United States.,Program in Neuroscience, Harvard Medical School, Boston, United States
| | - Emi Ling
- Department of Neurobiology, Harvard Medical School, Boston, United States
| | | | | | - Thomas Vierbuchen
- Developmental Biology Program, Sloan Kettering Institute for Cancer Research, New York, United States.,Center for Stem Cell Biology, Sloan Kettering Institute for Cancer Research, New York, United States
| |
Collapse
|
25
|
Song W, Ovcharenko I. Heterogeneity of enhancers embodies shared and representative functional groups underlying developmental and cell type-specific gene regulation. Gene 2022; 834:146640. [PMID: 35680026 PMCID: PMC9235925 DOI: 10.1016/j.gene.2022.146640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 04/20/2022] [Accepted: 06/02/2022] [Indexed: 11/04/2022]
Abstract
While enhancers in a particular tissue coordinately fulfill regulatory functions, these functions are heterogeneous in nature and comprise of multiple enhancer subclasses and the associated regulatory mechanisms. In this work, we used multiple cell lines to identify enhancer subclasses linked to development, differentiation, and cellular identity. We found that enhancer functional heterogeneity during development encompasses subclasses of ubiquitous functions (11%), development specific regulatory activity (62%), and chromatin interactions (12%). In differentiated cell lines, ubiquitous enhancers (10%) stay active across multiple cell lines.They are accompanied by a large enhancer subclass (ranging from 33% to 63%) with functions specific to the corresponding lineage. The remaining enhancers (27-40%) establish regulatory chromatin structure and facilitate interactions of cell type-specific enhancers with their target promoters. In addition to specialized functions of cell type-specific enhancers, we show that proper accounting of enhancer heterogeneity leads to a 10% increase in accuracy of enhancer classification, which significantly improves the modeling of enhancers and identification of underlying regulatory mechanisms. In summary, our observations suggest that although cell type-specific enhancers are heterogeneous and coordinate different regulatory programs, enhancers from different cell lines maintain common categories of functional groups across developmental and differentiation stages, indicating a higher order rule followed by enhancer-gene regulation.
Collapse
Affiliation(s)
- Wei Song
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
26
|
Pizzollo J, Zintel TM, Babbitt CC. Differentially active and conserved neural enhancers define two forms of adaptive non-coding evolution in humans. Genome Biol Evol 2022; 14:6648393. [PMID: 35866592 PMCID: PMC9348619 DOI: 10.1093/gbe/evac108] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2022] [Indexed: 11/28/2022] Open
Abstract
The human and chimpanzee genomes are strikingly similar, but our neural phenotypes are very different. Many of these differences are likely driven by changes in gene expression, and some of those changes may have been adaptive during human evolution. Yet, the relative contributions of positive selection on regulatory regions or other functional regulatory changes are unclear. Where are these changes located throughout the human genome? Are functional regulatory changes near genes or are they in distal enhancer regions? In this study, we experimentally combined both human and chimpanzee cis-regulatory elements (CREs) that showed either (1) signs of accelerated evolution in humans or (2) that have been shown to be active in the human brain. Using a massively parallel reporter assay, we tested the ability of orthologous human and chimpanzee CREs to activate transcription in induced pluripotent stem-cell-derived neural progenitor cells and neurons. With this assay, we identified 179 CREs with differential activity between human and chimpanzee; in contrast, we found 722 CREs with signs of positive selection in humans. Selection and differentially expressed CREs strikingly differ in level of expression, size, and genomic location. We found a subset of 69 CREs in loci with genetic variants associated with neuropsychiatric diseases, which underscores the consequence of regulatory activity in these loci for proper neural development and function. By combining CREs that either experienced recent selection in humans or CREs that are functional brain enhancers, presents a novel way of studying the evolution of noncoding elements that contribute to human neural phenotypes.
Collapse
Affiliation(s)
- Jason Pizzollo
- Molecular and Cellular Biology Graduate Program, University of Massachusetts Amherst, Amherst, MA 01003, USA.,Department of Biology, University of Massachusetts Amherst, Amherst, MA 01003, USA
| | - Trisha M Zintel
- Molecular and Cellular Biology Graduate Program, University of Massachusetts Amherst, Amherst, MA 01003, USA.,Department of Biology, University of Massachusetts Amherst, Amherst, MA 01003, USA
| | - Courtney C Babbitt
- Department of Biology, University of Massachusetts Amherst, Amherst, MA 01003, USA
| |
Collapse
|
27
|
Isbel L, Grand RS, Schübeler D. Generating specificity in genome regulation through transcription factor sensitivity to chromatin. Nat Rev Genet 2022; 23:728-740. [PMID: 35831531 DOI: 10.1038/s41576-022-00512-6] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/30/2022] [Indexed: 12/11/2022]
Abstract
Cell type-specific gene expression relies on transcription factors (TFs) binding DNA sequence motifs embedded in chromatin. Understanding how motifs are accessed in chromatin is crucial to comprehend differential transcriptional responses and the phenotypic impact of sequence variation. Chromatin obstacles to TF binding range from DNA methylation to restriction of DNA access by nucleosomes depending on their position, composition and modification. In vivo and in vitro approaches now enable the study of TF binding in chromatin at unprecedented resolution. Emerging insights suggest that TFs vary in their ability to navigate chromatin states. However, it remains challenging to link binding and transcriptional outcomes to molecular characteristics of TFs or the local chromatin substrate. Here, we discuss our current understanding of how TFs access DNA in chromatin and novel techniques and directions towards a better understanding of this critical step in genome regulation.
Collapse
Affiliation(s)
- Luke Isbel
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.,School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Ralph S Grand
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.,Zentrum für Molekulare Biologie der Universität Heidelberg, Heidelberg, Germany
| | - Dirk Schübeler
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland. .,Faculty of Sciences, University of Basel, Basel, Switzerland.
| |
Collapse
|
28
|
DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat Genet 2022; 54:613-624. [PMID: 35551305 DOI: 10.1038/s41588-022-01048-5] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 03/08/2022] [Indexed: 02/06/2023]
Abstract
Enhancer sequences control gene expression and comprise binding sites (motifs) for different transcription factors (TFs). Despite extensive genetic and computational studies, the relationship between DNA sequence and regulatory activity is poorly understood, and de novo enhancer design has been challenging. Here, we built a deep-learning model, DeepSTARR, to quantitatively predict the activities of thousands of developmental and housekeeping enhancers directly from DNA sequence in Drosophila melanogaster S2 cells. The model learned relevant TF motifs and higher-order syntax rules, including functionally nonequivalent instances of the same TF motif that are determined by motif-flanking sequence and intermotif distances. We validated these rules experimentally and demonstrated that they can be generalized to humans by testing more than 40,000 wildtype and mutant Drosophila and human enhancers. Finally, we designed and functionally validated synthetic enhancers with desired activities de novo.
Collapse
|
29
|
Genetic predisposition to papillary thyroid carcinoma is mediated by a long non-coding RNA TINCR enhancer polymorphism. Int Immunopharmacol 2022; 109:108796. [PMID: 35489191 DOI: 10.1016/j.intimp.2022.108796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 03/31/2022] [Accepted: 04/20/2022] [Indexed: 11/23/2022]
Abstract
Single nucleotide polymorphisms (SNPs) in the enhancer region have been demonstrated to confer to altered enhancer activities, aberrant gene expression, and cancer susceptibility. In this study, we aimed to examine the association between an SNP, rs8101923, within terminal differentiation-induced non-coding RNA (TINCR) and the risk of papillary thyroid carcinoma (PTC). Blood samples from 559 patients with PTC and 445 healthy individuals were collected. The rs8101923 was genotyped by using polymerase chain reaction-restriction fragment length polymorphism assay. The impact of the rs8101923 on TINCR expression and enhancer activity was evaluated by quantitative real-time PCR and dual-luciferase reporter assay. The binding of AP-2α to TINCR enhancer was determined by chromatin immunoprecipitation. The rs8101923 G allele was significantly associated with a higher risk of PTC (adjusted OR = 1.37; 95% CI: 1.15-1.64). Mechanistically, the rs8101923 was related to increased transcriptional levels and enhancer activities (P < 0.05). Transcription factor AP-2α binds to the enhancer region of TINCR containing the rs8101923 locus, and promotes cell proliferation in PTC. These findings suggest the rs8101923 as a risk factor in the pathogenesis of PTC, which provides evidence for explaining the mechanism of the rs8101923 risk allele predisposing to PTC.
Collapse
|
30
|
Jimeno-Martín A, Sousa E, Brocal-Ruiz R, Daroqui N, Maicas M, Flames N. Joint actions of diverse transcription factor families establish neuron-type identities and promote enhancer selectivity. Genome Res 2022; 32:459-473. [PMID: 35074859 PMCID: PMC8896470 DOI: 10.1101/gr.275623.121] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 01/19/2022] [Indexed: 11/24/2022]
Abstract
To systematically investigate the complexity of neuron specification regulatory networks, we performed an RNA interference (RNAi) screen against all 875 transcription factors (TFs) encoded in Caenorhabditis elegans genome and searched for defects in nine different neuron types of the monoaminergic (MA) superclass and two cholinergic motoneurons. We identified 91 TF candidates to be required for correct generation of these neuron types, of which 28 were confirmed by mutant analysis. We found that correct reporter expression in each individual neuron type requires at least nine different TFs. Individual neuron types do not usually share TFs involved in their specification but share a common pattern of TFs belonging to the five most common TF families: homeodomain (HD), basic helix loop helix (bHLH), zinc finger (ZF), basic leucine zipper domain (bZIP), and nuclear hormone receptors (NHR). HD TF members are overrepresented, supporting a key role for this family in the establishment of neuronal identities. These five TF families are also prevalent when considering mutant alleles with previously reported neuronal phenotypes in C. elegans, Drosophila, and mouse. In addition, we studied terminal differentiation complexity focusing on the dopaminergic terminal regulatory program. We found two HD TFs (UNC-62 and VAB-3) that work together with known dopaminergic terminal selectors (AST-1, CEH-43, CEH-20). Combined TF binding sites for these five TFs constitute a cis-regulatory signature enriched in the regulatory regions of dopaminergic effector genes. Our results provide new insights on neuron-type regulatory programs in C. elegans that could help better understand neuron specification and evolution of neuron types.
Collapse
Affiliation(s)
- Angela Jimeno-Martín
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, 46010, Spain
| | - Erick Sousa
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, 46010, Spain
| | - Rebeca Brocal-Ruiz
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, 46010, Spain
| | - Noemi Daroqui
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, 46010, Spain
| | - Miren Maicas
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, 46010, Spain
| | - Nuria Flames
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, 46010, Spain
| |
Collapse
|
31
|
Abstract
DNA can determine where and when genes are expressed, but the full set of sequence determinants that control gene expression is unknown. Here, we measured the transcriptional activity of DNA sequences that represent an ~100 times larger sequence space than the human genome using massively parallel reporter assays (MPRAs). Machine learning models revealed that transcription factors (TFs) generally act in an additive manner with weak grammar and that most enhancers increase expression from a promoter by a mechanism that does not appear to involve specific TF–TF interactions. The enhancers themselves can be classified into three types: classical, closed chromatin and chromatin dependent. We also show that few TFs are strongly active in a cell, with most activities being similar between cell types. Individual TFs can have multiple gene regulatory activities, including chromatin opening and enhancing, promoting and determining transcription start site (TSS) activity, consistent with the view that the TF binding motif is the key atomic unit of gene expression. Analysis of massively parallel reporter assays measuring the transcriptional activity of DNA sequences indicates that most transcription factor (TF) activity is additive and does not rely on specific TF–TF interactions. Individual TFs can have different gene regulatory activities.
Collapse
|
32
|
Maderazo D, Flegg JA, Algama M, Ramialison M, Keith J. Detection and identification of cis-regulatory elements using change-point and classification algorithms. BMC Genomics 2022; 23:78. [PMID: 35078412 PMCID: PMC8790847 DOI: 10.1186/s12864-021-08190-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 11/19/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transcriptional regulation is primarily mediated by the binding of factors to non-coding regions in DNA. Identification of these binding regions enhances understanding of tissue formation and potentially facilitates the development of gene therapies. However, successful identification of binding regions is made difficult by the lack of a universal biological code for their characterisation. RESULTS We extend an alignment-based method, changept, and identify clusters of biological significance, through ontology and de novo motif analysis. Further, we apply a Bayesian method to estimate and combine binary classifiers on the clusters we identify to produce a better performing composite. CONCLUSIONS The analysis we describe provides a computational method for identification of conserved binding sites in the human genome and facilitates an alternative interrogation of combinations of existing data sets with alignment data.
Collapse
Affiliation(s)
- Dominic Maderazo
- School of Mathematics and Statistics, The University of Melbourne, Melbourne, 3010, VIC, Australia.
| | - Jennifer A Flegg
- School of Mathematics and Statistics, The University of Melbourne, Melbourne, 3010, VIC, Australia
| | - Manjula Algama
- School of Mathematics, Monash University, Melbourne, 3800, VIC, Australia
| | - Mirana Ramialison
- Australian Regenerative Medicine Institute, Monash University, Melbourne, 3800, VIC, Australia
| | - Jonathan Keith
- School of Mathematics, Monash University, Melbourne, 3800, VIC, Australia
| |
Collapse
|
33
|
Shen Z, Li RZ, Prohaska TA, Hoeksema MA, Spann NJ, Tao J, Fonseca GJ, Le T, Stolze LK, Sakai M, Romanoski CE, Glass CK. Systematic analysis of naturally occurring insertions and deletions that alter transcription factor spacing identifies tolerant and sensitive transcription factor pairs. eLife 2022; 11:70878. [PMID: 35049498 PMCID: PMC8809895 DOI: 10.7554/elife.70878] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 01/12/2022] [Indexed: 11/13/2022] Open
Abstract
Regulation of gene expression requires the combinatorial binding of sequence-specific transcription factors (TFs) at promoters and enhancers. Prior studies showed that alterations in the spacing between TF binding sites can influence promoter and enhancer activity. However, the relative importance of TF spacing alterations resulting from naturally occurring insertions and deletions (InDels) has not been systematically analyzed. To address this question, we first characterized the genome-wide spacing relationships of 73 TFs in human K562 cells as determined by ChIP-seq (chromatin immunoprecipitation sequencing). We found a dominant pattern of a relaxed range of spacing between collaborative factors, including 45 TFs exclusively exhibiting relaxed spacing with their binding partners. Next, we exploited millions of InDels provided by genetically diverse mouse strains and human individuals to investigate the effects of altered spacing on TF binding and local histone acetylation. These analyses suggested that spacing alterations resulting from naturally occurring InDels are generally tolerated in comparison to genetic variants directly affecting TF binding sites. To experimentally validate this prediction, we introduced synthetic spacing alterations between PU.1 and C/EBPβ binding sites at six endogenous genomic loci in a macrophage cell line. Remarkably, collaborative binding of PU.1 and C/EBPβ at these locations tolerated changes in spacing ranging from 5 bp increase to >30 bp decrease. Collectively, these findings have implications for understanding mechanisms underlying enhancer selection and for the interpretation of non-coding genetic variation.
Collapse
Affiliation(s)
- Zeyang Shen
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, United States
| | - Rick Z Li
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, United States
| | - Thomas A Prohaska
- Department of Medicine, University of California, San Diego, La Jolla, United States
| | - Marten A Hoeksema
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, United States
| | - Nathan J Spann
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, United States
| | - Jenhan Tao
- Department of Cellular and Molecular Medicine, University of California, San Diego, San Diego, United States
| | - Gregory J Fonseca
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, United States
| | - Thomas Le
- Division of Biological Sciences, University of California, San Diego, La Jolla, United States
| | - Lindsey K Stolze
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, United States
| | - Mashito Sakai
- Department of Biochemistry and Molecular Biology, Nippon Medical School, Tokyo, Japan
| | - Casey E Romanoski
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, United States
| | - Christopher K Glass
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, United States
| |
Collapse
|
34
|
Bakoulis S, Krautz R, Alcaraz N, Salvatore M, Andersson R. OUP accepted manuscript. Nucleic Acids Res 2022; 50:2111-2127. [PMID: 35166831 PMCID: PMC8887488 DOI: 10.1093/nar/gkac088] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 01/22/2022] [Accepted: 01/27/2022] [Indexed: 11/12/2022] Open
Affiliation(s)
| | | | - Nicolas Alcaraz
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
- Novo Nordisk Foundation Center for Protein Research (CPR), University of Copenhagen, 2200 Copenhagen, Denmark
| | - Marco Salvatore
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Robin Andersson
- To whom correspondence should be addressed. Tel: +45 35330245;
| |
Collapse
|
35
|
Waters CT, Gisselbrecht SS, Sytnikova YA, Cafarelli TM, Hill DE, Bulyk ML. Quantitative-enhancer-FACS-seq (QeFS) reveals epistatic interactions among motifs within transcriptional enhancers in developing Drosophila tissue. Genome Biol 2021; 22:348. [PMID: 34930411 PMCID: PMC8686523 DOI: 10.1186/s13059-021-02574-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 12/10/2021] [Indexed: 11/16/2022] Open
Abstract
Understanding the contributions of transcription factor DNA binding sites to transcriptional enhancers is a significant challenge. We developed Quantitative enhancer-FACS-Seq for highly parallel quantification of enhancer activities from a genomically integrated reporter in Drosophila melanogaster embryos. We investigate the contributions of the DNA binding motifs of four poorly characterized TFs to the activities of twelve embryonic mesodermal enhancers. We measure quantitative changes in enhancer activity and discover a range of epistatic interactions among the motifs, both synergistic and alleviating. We find that understanding the regulatory consequences of TF binding motifs requires that they be investigated in combination across enhancer contexts.
Collapse
Affiliation(s)
- Colin T Waters
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, 02138, USA
| | - Stephen S Gisselbrecht
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Yuliya A Sytnikova
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Tiziana M Cafarelli
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - David E Hill
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA.
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, 02138, USA.
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
36
|
Sousa E, Flames N. Transcriptional regulation of neuronal identity. Eur J Neurosci 2021; 55:645-660. [PMID: 34862697 PMCID: PMC9306894 DOI: 10.1111/ejn.15551] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 11/22/2021] [Accepted: 11/23/2021] [Indexed: 11/29/2022]
Abstract
Neuronal diversity is an intrinsic feature of the nervous system. Transcription factors (TFs) are key regulators in the establishment of different neuronal identities; how are the actions of different TFs coordinated to orchestrate this diversity? Are there common features shared among the different neuron types of an organism or even among different animal groups? In this review, we provide a brief overview on common traits emerging on the transcriptional regulation of neuron type diversification with a special focus on the comparison between mouse and Caenorhabditis elegans model systems. In the first part, we describe general concepts on neuronal identity and transcriptional regulation of gene expression. In the second part of the review, TFs are classified in different categories according to their key roles at specific steps along the protracted process of neuronal specification and differentiation. The same TF categories can be identified both in mammals and nematodes. Importantly, TFs are very pleiotropic: Depending on the neuron type or the time in development, the same TF can fulfil functions belonging to different categories. Finally, we describe the key role of transcriptional repression at all steps controlling neuronal diversity and propose that acquisition of neuronal identities could be considered a metastable process.
Collapse
Affiliation(s)
- Erick Sousa
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, Spain
| | - Nuria Flames
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSIC, Valencia, Spain
| |
Collapse
|
37
|
Waymack R, Gad M, Wunderlich Z. Molecular competition can shape enhancer activity in the Drosophila embryo. iScience 2021; 24:103034. [PMID: 34568782 PMCID: PMC8449247 DOI: 10.1016/j.isci.2021.103034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 07/27/2021] [Accepted: 08/20/2021] [Indexed: 01/12/2023] Open
Abstract
Transgenic reporters allow the measurement of regulatory DNA activity in vivo and consequently have long been useful tools for studying enhancers. Despite their utility, few studies have investigated the effects these reporters may have on the expression of other genes. Understanding these effects is required to accurately interpret reporter data and characterize gene regulatory mechanisms. By measuring the expression of Kruppel (Kr) enhancer reporters in live Drosophila embryos, we find reporters inhibit one another’s expression and that of a nearby endogenous gene. Using synthetic transcription factor (TF) binding site arrays, we present evidence that competition for TFs is partially responsible for the observed transcriptional inhibition. We develop a simple thermodynamic model that predicts competition of the measured magnitude specifically when TF binding is restricted to distinct nuclear subregions. Our findings underline an unexpected role of the non-homogenous nature of the nucleus in regulating gene expression. Live tracking of transcription reveals competition between transgenic reporters Transgenic reporters can also depress the expression of a neighboring gene Expression inhibition is in part because of competition for transcription factors (TFs) Competition is predicted with a model that restricts TFs to sub-nuclear “hubs”
Collapse
Affiliation(s)
- Rachel Waymack
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA
| | - Mario Gad
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA
| | - Zeba Wunderlich
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA.,Department of Biology, Boston University, 610 Commonwealth Ave., Boston, MA 02215, USA.,Biological Design Center, Boston University, 610 Commonwealth Avenue, Boston, MA 02215, USA
| |
Collapse
|
38
|
Molecular and Cellular Insights into the Development of Uterine Fibroids. Int J Mol Sci 2021; 22:ijms22168483. [PMID: 34445194 PMCID: PMC8395213 DOI: 10.3390/ijms22168483] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 08/03/2021] [Accepted: 08/05/2021] [Indexed: 12/12/2022] Open
Abstract
Uterine leiomyomas represent the most common benign gynecologic tumor. These hormone-dependent smooth-muscle formations occur with an estimated prevalence of ~70% among women of reproductive age and cause symptoms including pain, abnormal uterine bleeding, infertility, and recurrent abortion. Despite the prevalence and public health impact of uterine leiomyomas, available treatments remain limited. Among the potential causes of leiomyomas, early hormonal exposure during periods of development may result in developmental reprogramming via epigenetic changes that persist in adulthood, leading to disease onset or progression. Recent developments in unbiased high-throughput sequencing technology enable powerful approaches to detect driver mutations, yielding new insights into the genomic instability of leiomyomas. Current data also suggest that each leiomyoma originates from the clonal expansion of a single transformed somatic stem cell of the myometrium. In this review, we propose an integrated cellular and molecular view of the origins of leiomyomas, as well as paradigm-shifting studies that will lead to better understanding and the future development of non-surgical treatments for these highly frequent tumors.
Collapse
|
39
|
Weidemüller P, Kholmatov M, Petsalaki E, Zaugg JB. Transcription factors: Bridge between cell signaling and gene regulation. Proteomics 2021; 21:e2000034. [PMID: 34314098 DOI: 10.1002/pmic.202000034] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 07/05/2021] [Accepted: 07/16/2021] [Indexed: 01/17/2023]
Abstract
Transcription factors (TFs) are key regulators of intrinsic cellular processes, such as differentiation and development, and of the cellular response to external perturbation through signaling pathways. In this review we focus on the role of TFs as a link between signaling pathways and gene regulation. Cell signaling tends to result in the modulation of a set of TFs that then lead to changes in the cell's transcriptional program. We highlight the molecular layers at which TF activity can be measured and the associated technical and conceptual challenges. These layers include post-translational modifications (PTMs) of the TF, regulation of TF binding to DNA through chromatin accessibility and epigenetics, and expression of target genes. We highlight that a large number of TFs are understudied in both signaling and gene regulation studies, and that our knowledge about known TF targets has a strong literature bias. We argue that TFs serve as a perfect bridge between the fields of gene regulation and signaling, and that separating these fields hinders our understanding of cell functions. Multi-omics approaches that measure multiple dimensions of TF activity are ideally suited to study the interplay of cell signaling and gene regulation using TFs as the anchor to link the two fields.
Collapse
Affiliation(s)
- Paula Weidemüller
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Maksim Kholmatov
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstraße 1, Heidelberg, 69117, Germany
| | - Evangelia Petsalaki
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Judith B Zaugg
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstraße 1, Heidelberg, 69117, Germany
| |
Collapse
|
40
|
Deregulation of Transcriptional Enhancers in Cancer. Cancers (Basel) 2021; 13:cancers13143532. [PMID: 34298745 PMCID: PMC8303223 DOI: 10.3390/cancers13143532] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 06/29/2021] [Accepted: 07/08/2021] [Indexed: 12/14/2022] Open
Abstract
Simple Summary One of the major challenges in cancer treatments is the dynamic adaptation of tumor cells to cancer therapies. In this regard, tumor cells can modify their response to environmental cues without altering their DNA sequence. This cell plasticity enables cells to undergo morphological and functional changes, for example, during the process of tumour metastasis or when acquiring resistance to cancer therapies. Central to cell plasticity, are the dynamic changes in gene expression that are controlled by a set of molecular switches called enhancers. Enhancers are DNA elements that determine when, where and to what extent genes should be switched on and off. Thus, defects in enhancer function can disrupt the gene expression program and can lead to tumour formation. Here, we review how enhancers control the activity of cancer-associated genes and how defects in these regulatory elements contribute to cell plasticity in cancer. Understanding enhancer (de)regulation can provide new strategies for modulating cell plasticity in tumour cells and can open new research avenues for cancer therapy. Abstract Epigenetic regulations can shape a cell’s identity by reversible modifications of the chromatin that ultimately control gene expression in response to internal and external cues. In this review, we first discuss the concept of cell plasticity in cancer, a process that is directly controlled by epigenetic mechanisms, with a particular focus on transcriptional enhancers as the cornerstone of epigenetic regulation. In the second part, we discuss mechanisms of enhancer deregulation in adult stem cells and epithelial-to-mesenchymal transition (EMT), as two paradigms of cell plasticity that are dependent on epigenetic regulation and serve as major sources of tumour heterogeneity. Finally, we review how genetic variations at enhancers and their epigenetic modifiers contribute to tumourigenesis, and we highlight examples of cancer drugs that target epigenetic modifications at enhancers.
Collapse
|
41
|
Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021; 8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Mariia Kokina
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Victor Garcia
- School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
42
|
Hoeksema MA, Shen Z, Holtman IR, Zheng A, Spann NJ, Cobo I, Gymrek M, Glass CK. Mechanisms underlying divergent responses of genetically distinct macrophages to IL-4. SCIENCE ADVANCES 2021; 7:7/25/eabf9808. [PMID: 34134993 PMCID: PMC8208725 DOI: 10.1126/sciadv.abf9808] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 04/29/2021] [Indexed: 05/24/2023]
Abstract
Mechanisms by which noncoding genetic variation influences gene expression remain only partially understood but are considered to be major determinants of phenotypic diversity and disease risk. Here, we evaluated effects of >50 million single-nucleotide polymorphisms and short insertions/deletions provided by five inbred strains of mice on the responses of macrophages to interleukin-4 (IL-4), a cytokine that plays pleiotropic roles in immunity and tissue homeostasis. Of >600 genes induced >2-fold by IL-4 across the five strains, only 26 genes reached this threshold in all strains. By applying deep learning and motif mutation analyses to epigenetic data for macrophages from each strain, we identified the dominant combinations of lineage-determining and signal-dependent transcription factors driving IL-4 enhancer activation. These studies further revealed mechanisms by which noncoding genetic variation influences absolute levels of enhancer activity and their dynamic responses to IL-4, thereby contributing to strain-differential patterns of gene expression and phenotypic diversity.
Collapse
Affiliation(s)
- Marten A Hoeksema
- Department of Cellular and Molecular Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Zeyang Shen
- Department of Cellular and Molecular Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, Jacobs School of Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Inge R Holtman
- Department of Cellular and Molecular Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
- Section Molecular Neurobiology, Department of Biomedical Sciences of Cells and Systems, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - An Zheng
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Nathan J Spann
- Department of Cellular and Molecular Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Isidoro Cobo
- Department of Cellular and Molecular Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Christopher K Glass
- Department of Cellular and Molecular Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
43
|
Letiagina AE, Omelina ES, Ivankin AV, Pindyurin AV. MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. Front Genet 2021; 12:618189. [PMID: 34046055 PMCID: PMC8148044 DOI: 10.3389/fgene.2021.618189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/25/2021] [Indexed: 11/13/2022] Open
Abstract
Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC-ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC-ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional "mapping" samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.
Collapse
Affiliation(s)
- Anna E Letiagina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Faculty of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Evgeniya S Omelina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anton V Ivankin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Alexey V Pindyurin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
44
|
Lewis EMA, Kaushik K, Sandoval LA, Antony I, Dietmann S, Kroll KL. Epigenetic regulation during human cortical development: Seq-ing answers from the brain to the organoid. Neurochem Int 2021; 147:105039. [PMID: 33915225 PMCID: PMC8387070 DOI: 10.1016/j.neuint.2021.105039] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 03/23/2021] [Accepted: 03/27/2021] [Indexed: 01/22/2023]
Abstract
Epigenetic regulation plays an important role in controlling gene expression during complex processes, such as development of the human brain. Mutations in genes encoding chromatin modifying proteins and in the non-protein coding sequences of the genome can potentially alter transcription factor binding or chromatin accessibility. Such mutations can frequently cause neurodevelopmental disorders, therefore understanding how epigenetic regulation shapes brain development is of particular interest. While epigenetic regulation of neural development has been extensively studied in murine models, significant species-specific differences in both the genome sequence and in brain development necessitate human models. However, access to human fetal material is limited and these tissues cannot be grown or experimentally manipulated ex vivo. Therefore, models that recapitulate particular aspects of human fetal brain development, such as the in vitro differentiation of human pluripotent stem cells (hPSCs), are instrumental for studying the epigenetic regulation of human neural development. Here, we examine recent studies that have defined changes in the epigenomic landscape during fetal brain development. We compare these studies with analogous data derived by in vitro differentiation of hPSCs into specific neuronal cell types or as three-dimensional cerebral organoids. Such comparisons can be informative regarding which aspects of fetal brain development are faithfully recapitulated by in vitro differentiation models and provide a foundation for using experimentally tractable in vitro models of human brain development to study neural gene regulation and the basis of its disruption to cause neurodevelopmental disorders.
Collapse
Affiliation(s)
- Emily M A Lewis
- Department of Developmental Biology, Washington University School of Medicine, 660 S. Euclid Avenue St, Louis, MO, 63110, USA.
| | - Komal Kaushik
- Department of Developmental Biology, Washington University School of Medicine, 660 S. Euclid Avenue St, Louis, MO, 63110, USA.
| | - Luke A Sandoval
- Department of Developmental Biology, Washington University School of Medicine, 660 S. Euclid Avenue St, Louis, MO, 63110, USA.
| | - Irene Antony
- Department of Developmental Biology, Washington University School of Medicine, 660 S. Euclid Avenue St, Louis, MO, 63110, USA.
| | - Sabine Dietmann
- Department of Developmental Biology, Washington University School of Medicine, 660 S. Euclid Avenue St, Louis, MO, 63110, USA.
| | - Kristen L Kroll
- Department of Developmental Biology, Washington University School of Medicine, 660 S. Euclid Avenue St, Louis, MO, 63110, USA.
| |
Collapse
|
45
|
Mishra B, Athar M, Mukhtar MS. Transcriptional circuitry atlas of genetic diverse unstimulated murine and human macrophages define disparity in population-wide innate immunity. Sci Rep 2021; 11:7373. [PMID: 33795737 PMCID: PMC8016976 DOI: 10.1038/s41598-021-86742-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 03/12/2021] [Indexed: 02/07/2023] Open
Abstract
Macrophages are ubiquitous custodians of tissues, which play decisive role in maintaining cellular homeostasis through regulatory immune responses. Within tissues, macrophage exhibit extremely heterogeneous population with varying functions orchestrated through regulatory response, which can be further exacerbated in diverse genetic backgrounds. Gene regulatory networks (GRNs) offer comprehensive understanding of cellular regulatory behavior by unfolding the transcription factors (TFs) and regulated target genes. RNA-Seq coupled with ATAC-Seq has revolutionized the regulome landscape influenced by gene expression modeling. Here, we employ an integrative multi-omics systems biology-based analysis and generated GRNs derived from the unstimulated bone marrow-derived macrophages of five inbred genetically defined murine strains, which are reported to be linked with most of the population-wide human genetic variants. Our probabilistic modeling of a basal hemostasis pan regulatory repertoire in diverse macrophages discovered 96 TFs targeting 6279 genes representing 468,291 interactions across five inbred murine strains. Subsequently, we identify core and distinctive GRN sub-networks in unstimulated macrophages to describe the system-wide conservation and dissimilarities, respectively across five murine strains. Our study concludes that discrepancies in unstimulated macrophage-specific regulatory networks not only drives the basal functional plasticity within genetic backgrounds, additionally aid in understanding the complexity of racial disparity among the human population during stress.
Collapse
Affiliation(s)
- Bharat Mishra
- Department of Biology, University of Alabama At Birmingham, 464 Campbell Hall, 1300 University Boulevard, Alabama, 35294, USA
| | - Mohammad Athar
- UAB Research Center of Excellence in Arsenicals, Department of Dermatology, School of Medicine, University of Alabama At Birmingham, Alabama, 35294, USA.
| | - M Shahid Mukhtar
- Department of Biology, University of Alabama At Birmingham, 464 Campbell Hall, 1300 University Boulevard, Alabama, 35294, USA. .,Nutrition Obesity Research Center, University of Alabama At Birmingham, 1675 University Blvd, Birmingham, AL, 35294, USA. .,Department of Surgery, University of Alabama At Birmingham, 1808 7th Ave S, Birmingham, AL, 35294, USA.
| |
Collapse
|
46
|
Singh G, Mullany S, Moorthy SD, Zhang R, Mehdi T, Tian R, Duncan AG, Moses AM, Mitchell JA. A flexible repertoire of transcription factor binding sites and a diversity threshold determines enhancer activity in embryonic stem cells. Genome Res 2021; 31:564-575. [PMID: 33712417 PMCID: PMC8015845 DOI: 10.1101/gr.272468.120] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 02/19/2021] [Indexed: 12/28/2022]
Abstract
Transcriptional enhancers are critical for development and phenotype evolution and are often mutated in disease contexts; however, even in well-studied cell types, the sequence code conferring enhancer activity remains unknown. To examine the enhancer regulatory code for pluripotent stem cells, we identified genomic regions with conserved binding of multiple transcription factors in mouse and human embryonic stem cells (ESCs). Examination of these regions revealed that they contain on average 12.6 conserved transcription factor binding site (TFBS) sequences. Enriched TFBSs are a diverse repertoire of 70 different sequences representing the binding sequences of both known and novel ESC regulators. Using a diverse set of TFBSs from this repertoire was sufficient to construct short synthetic enhancers with activity comparable to native enhancers. Site-directed mutagenesis of conserved TFBSs in endogenous enhancers or TFBS deletion from synthetic sequences revealed a requirement for 10 or more different TFBSs. Furthermore, specific TFBSs, including the POU5F1:SOX2 comotif, are dispensable, despite cobinding the POU5F1 (also known as OCT4), SOX2, and NANOG master regulators of pluripotency. These findings reveal that a TFBS sequence diversity threshold overrides the need for optimized regulatory grammar and individual TFBSs that recruit specific master regulators.
Collapse
Affiliation(s)
- Gurdeep Singh
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Shanelle Mullany
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Sakthi D Moorthy
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Richard Zhang
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Tahmid Mehdi
- Department of Computer Science, University of Toronto, Toronto, M5S 2E4, Canada
| | - Ruxiao Tian
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Andrew G Duncan
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Alan M Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada.,Department of Computer Science, University of Toronto, Toronto, M5S 2E4, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, M5S 3B3, Canada
| | - Jennifer A Mitchell
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| |
Collapse
|
47
|
Jindal GA, Farley EK. Enhancer grammar in development, evolution, and disease: dependencies and interplay. Dev Cell 2021; 56:575-587. [PMID: 33689769 PMCID: PMC8462829 DOI: 10.1016/j.devcel.2021.02.016] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 02/15/2021] [Accepted: 02/16/2021] [Indexed: 12/19/2022]
Abstract
Each language has standard books describing that language's grammatical rules. Biologists have searched for similar, albeit more complex, principles relating enhancer sequence to gene expression. Here, we review the literature on enhancer grammar. We introduce dependency grammar, a model where enhancers encode information based on dependencies between enhancer features shaped by mechanistic, evolutionary, and biological constraints. Classifying enhancers based on the types of dependencies may identify unifying principles relating enhancer sequence to gene expression. Such rules would allow us to read the instructions for development within genomes and pinpoint causal enhancer variants underlying disease and evolutionary changes.
Collapse
Affiliation(s)
- Granton A Jindal
- Division of Cardiology, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Division of Biological Sciences, Section of Molecular Biology, University of California San Diego, La Jolla, CA 92093, USA
| | - Emma K Farley
- Division of Cardiology, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Division of Biological Sciences, Section of Molecular Biology, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
48
|
Shen Z, Hoeksema MA, Ouyang Z, Benner C, Glass CK. MAGGIE: leveraging genetic variation to identify DNA sequence motifs mediating transcription factor binding and function. Bioinformatics 2021; 36:i84-i92. [PMID: 32657363 PMCID: PMC7355228 DOI: 10.1093/bioinformatics/btaa476] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Genetic variation in regulatory elements can alter transcription factor (TF) binding by mutating a TF binding motif, which in turn may affect the activity of the regulatory elements. However, it is unclear which motifs are prone to impact transcriptional regulation if mutated. Current motif analysis tools either prioritize TFs based on motif enrichment without linking to a function or are limited in their applications due to the assumption of linearity between motifs and their functional effects. RESULTS We present MAGGIE (Motif Alteration Genome-wide to Globally Investigate Elements), a novel method for identifying motifs mediating TF binding and function. By leveraging measurements from diverse genotypes, MAGGIE uses a statistical approach to link mutations of a motif to changes of an epigenomic feature without assuming a linear relationship. We benchmark MAGGIE across various applications using both simulated and biological datasets and demonstrate its improvement in sensitivity and specificity compared with the state-of-the-art motif analysis approaches. We use MAGGIE to gain novel insights into the divergent functions of distinct NF-κB factors in pro-inflammatory macrophages, revealing the association of p65-p50 co-binding with transcriptional activation and the association of p50 binding lacking p65 with transcriptional repression. AVAILABILITY AND IMPLEMENTATION The Python package for MAGGIE is freely available at https://github.com/zeyang-shen/maggie. The accession number for the NF-κB ChIP-seq data generated for this study is Gene Expression Omnibus: GSE144070. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zeyang Shen
- Department of Cellular and Molecular Medicine, School of Medicine.,Department of Bioengineering, Jacobs School of Engineering
| | | | - Zhengyu Ouyang
- Department of Cellular and Molecular Medicine, School of Medicine
| | - Christopher Benner
- Department of Medicine, School of Medicine, University of California, San Diego, CA 92093, USA
| | - Christopher K Glass
- Department of Cellular and Molecular Medicine, School of Medicine.,Department of Medicine, School of Medicine, University of California, San Diego, CA 92093, USA
| |
Collapse
|
49
|
Mobility connects: transposable elements wire new transcriptional networks by transferring transcription factor binding motifs. Biochem Soc Trans 2021; 48:1005-1017. [PMID: 32573687 PMCID: PMC7329337 DOI: 10.1042/bst20190937] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 06/01/2020] [Accepted: 06/03/2020] [Indexed: 12/28/2022]
Abstract
Transposable elements (TEs) constitute major fractions of plant genomes. Their potential to be mobile provides them with the capacity to cause major genome rearrangements. Those effects are potentially deleterious and enforced the evolution of epigenetic suppressive mechanisms controlling TE activity. However, beyond their deleterious effects, TE insertions can be neutral or even advantageous for the host, leading to long-term retention of TEs in the host genome. Indeed, TEs are increasingly recognized as major drivers of evolutionary novelties by regulating the expression of nearby genes. TEs frequently contain binding motifs for transcription factors and capture binding motifs during transposition, which they spread through the genome by transposition. Thus, TEs drive the evolution and diversification of gene regulatory networks by recruiting lineage-specific targets under the regulatory control of specific transcription factors. This process can explain the rapid and repeated evolution of developmental novelties, such as C4 photosynthesis and a wide spectrum of stress responses in plants. It also underpins the convergent evolution of embryo nourishing tissues, the placenta in mammals and the endosperm in flowering plants. Furthermore, the gene regulatory network underlying flower development has also been largely reshaped by TE-mediated recruitment of regulatory elements; some of them being preserved across long evolutionary timescales. In this review, we highlight the potential role of TEs as evolutionary toolkits in plants by showcasing examples of TE-mediated evolutionary novelties.
Collapse
|
50
|
Alizada A, Khyzha N, Wang L, Antounians L, Chen X, Khor M, Liang M, Rathnakumar K, Weirauch MT, Medina-Rivera A, Fish JE, Wilson MD. Conserved regulatory logic at accessible and inaccessible chromatin during the acute inflammatory response in mammals. Nat Commun 2021; 12:567. [PMID: 33495464 PMCID: PMC7835376 DOI: 10.1038/s41467-020-20765-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 12/18/2020] [Indexed: 12/18/2022] Open
Abstract
The regulatory elements controlling gene expression during acute inflammation are not fully elucidated. Here we report the identification of a set of NF-κB-bound elements and common chromatin landscapes underlying the acute inflammatory response across cell-types and mammalian species. Using primary vascular endothelial cells (human/mouse/bovine) treated with the pro-inflammatory cytokine, Tumor Necrosis Factor-α, we identify extensive (~30%) conserved orthologous binding of NF-κB to accessible, as well as nucleosome-occluded chromatin. Regions with the highest NF-κB occupancy pre-stimulation show dramatic increases in NF-κB binding and chromatin accessibility post-stimulation. These 'pre-bound' regions are typically conserved (~56%), contain multiple NF-κB motifs, are utilized by diverse cell types, and overlap rare non-coding mutations and common genetic variation associated with both inflammatory and cardiovascular phenotypes. Genetic ablation of conserved, 'pre-bound' NF-κB regions within the super-enhancer associated with the chemokine-encoding CCL2 gene and elsewhere supports the functional relevance of these elements.
Collapse
Affiliation(s)
- Azad Alizada
- Hospital for Sick Children, Genetics and Genome Biology, Toronto, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Nadiya Khyzha
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada
- University Health Network, Toronto General Hospital Research Institute, Toronto, Canada
| | - Liangxi Wang
- Hospital for Sick Children, Genetics and Genome Biology, Toronto, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Lina Antounians
- Hospital for Sick Children, Genetics and Genome Biology, Toronto, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Xiaoting Chen
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital, Cincinnati, OH, USA
| | - Melvin Khor
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada
- University Health Network, Toronto General Hospital Research Institute, Toronto, Canada
| | - Minggao Liang
- Hospital for Sick Children, Genetics and Genome Biology, Toronto, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Kumaragurubaran Rathnakumar
- Hospital for Sick Children, Genetics and Genome Biology, Toronto, Canada
- University Health Network, Toronto General Hospital Research Institute, Toronto, Canada
| | - Matthew T Weirauch
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital, Cincinnati, OH, USA
- Division of Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Division of Developmental Biology, Cincinnati Children's Hospital, Cincinnati, OH, USA
| | - Alejandra Medina-Rivera
- Hospital for Sick Children, Genetics and Genome Biology, Toronto, Canada
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Mexico
| | - Jason E Fish
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada.
- University Health Network, Toronto General Hospital Research Institute, Toronto, Canada.
- University Health Network, Peter Munk Cardiac Centre, Toronto, Canada.
| | - Michael D Wilson
- Hospital for Sick Children, Genetics and Genome Biology, Toronto, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, Canada.
| |
Collapse
|