1
|
Xiang G, He X, Giardine BM, Isaac KJ, Taylor DJ, McCoy RC, Jansen C, Keller CA, Wixom AQ, Cockburn A, Miller A, Qi Q, He Y, Li Y, Lichtenberg J, Heuston EF, Anderson SM, Luan J, Vermunt MW, Yue F, Sauria ME, Schatz MC, Taylor J, Göttgens B, Hughes JR, Higgs DR, Weiss MJ, Cheng Y, Blobel GA, Bodine DM, Zhang Y, Li Q, Mahony S, Hardison RC. Interspecies regulatory landscapes and elements revealed by novel joint systematic integration of human and mouse blood cell epigenomes. bioRxiv 2024:2023.04.02.535219. [PMID: 37066352 PMCID: PMC10103973 DOI: 10.1101/2023.04.02.535219] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Knowledge of locations and activities of cis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state Regulatory Potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbored distinctive transcription factor binding motifs that were similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we showed that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.
Collapse
Affiliation(s)
- Guanjue Xiang
- Bioinformatics and Genomics Graduate Program, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02215
| | - Xi He
- Bioinformatics and Genomics Graduate Program, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802
| | - Belinda M. Giardine
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802
| | - Kathryn J. Isaac
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218
| | - Dylan J. Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218
| | - Camden Jansen
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802
| | - Cheryl A. Keller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802
| | - Alexander Q. Wixom
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802
| | - April Cockburn
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802
| | - Amber Miller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802
| | - Qian Qi
- Department of Hematology, St. Jude Children’s Research Hospital, Memphis, TN 38105
| | - Yanghua He
- Department of Hematology, St. Jude Children’s Research Hospital, Memphis, TN 38105
- Department of Human Nutrition, Food and Animal Sciences, University of Hawai`i at Mānoa, Honolulu, HI 96822, USA
| | - Yichao Li
- Department of Hematology, St. Jude Children’s Research Hospital, Memphis, TN 38105
| | - Jens Lichtenberg
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, Bethesda, MD 20892
| | - Elisabeth F. Heuston
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, Bethesda, MD 20892
| | - Stacie M. Anderson
- Flow Cytometry Core, National Human Genome Research Institute, Bethesda, MD 20892
| | - Jing Luan
- Department of Pediatrics, Children’s Hospital of Philadelphia, and Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Marit W. Vermunt
- Department of Pediatrics, Children’s Hospital of Philadelphia, and Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Feng Yue
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Evanston, IL 60611
| | - Michael E.G. Sauria
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218
| | - James Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218
| | - Berthold Göttgens
- Welcome and MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
| | - Jim R. Hughes
- MRC Weatherall Institute of Molecular Medicine, Oxford University, Oxford, UK
| | - Douglas R. Higgs
- MRC Weatherall Institute of Molecular Medicine, Oxford University, Oxford, UK
| | - Mitchell J. Weiss
- Department of Hematology, St. Jude Children’s Research Hospital, Memphis, TN 38105
| | - Yong Cheng
- Department of Hematology, St. Jude Children’s Research Hospital, Memphis, TN 38105
| | - Gerd A. Blobel
- Department of Pediatrics, Children’s Hospital of Philadelphia, and Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - David M. Bodine
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, Bethesda, MD 20892
| | - Yu Zhang
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802
| | - Qunhua Li
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802
- Center for Computational Biology and Bioinformatics, Genome Sciences Institute, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802
- Center for Computational Biology and Bioinformatics, Genome Sciences Institute, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802
| | - Ross C. Hardison
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802
- Center for Computational Biology and Bioinformatics, Genome Sciences Institute, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802
| |
Collapse
|
2
|
Kwak IY, Kim BC, Lee J, Kang T, Garry DJ, Zhang J, Gong W. Proformer: a hybrid macaron transformer model predicts expression values from promoter sequences. BMC Bioinformatics 2024; 25:81. [PMID: 38378442 PMCID: PMC10877777 DOI: 10.1186/s12859-024-05645-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 01/08/2024] [Indexed: 02/22/2024] Open
Abstract
The breakthrough high-throughput measurement of the cis-regulatory activity of millions of randomly generated promoters provides an unprecedented opportunity to systematically decode the cis-regulatory logic that determines the expression values. We developed an end-to-end transformer encoder architecture named Proformer to predict the expression values from DNA sequences. Proformer used a Macaron-like Transformer encoder architecture, where two half-step feed forward (FFN) layers were placed at the beginning and the end of each encoder block, and a separable 1D convolution layer was inserted after the first FFN layer and in front of the multi-head attention layer. The sliding k-mers from one-hot encoded sequences were mapped onto a continuous embedding, combined with the learned positional embedding and strand embedding (forward strand vs. reverse complemented strand) as the sequence input. Moreover, Proformer introduced multiple expression heads with mask filling to prevent the transformer models from collapsing when training on relatively small amount of data. We empirically determined that this design had significantly better performance than the conventional design such as using the global pooling layer as the output layer for the regression task. These analyses support the notion that Proformer provides a novel method of learning and enhances our understanding of how cis-regulatory sequences determine the expression values.
Collapse
Affiliation(s)
- Il-Youp Kwak
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Byeong-Chan Kim
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Juhyun Lee
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Taein Kang
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Daniel J Garry
- Cardiovascular Division, Department of Medicine, Lillehei Heart Institute, University of Minnesota, 2231 6th St SE, Minneapolis, MN, 55455, USA.
- Stem Cell Institute, University of Minnesota, Minneapolis, MN, 55455, USA.
- Paul and Sheila Wellstone Muscular Dystrophy Center, University of Minnesota, Minneapolis, MN, 55455, USA.
| | - Jianyi Zhang
- Department of Biomedical Engineering, The University of Alabama at Birmingham, Birmingham, AL, 35233, USA
| | - Wuming Gong
- Cardiovascular Division, Department of Medicine, Lillehei Heart Institute, University of Minnesota, 2231 6th St SE, Minneapolis, MN, 55455, USA.
| |
Collapse
|
3
|
Russo M, Piccolo V, Polizzese D, Prosperini E, Borriero C, Polletti S, Bedin F, Marenda M, Michieletto D, Mandana GM, Rodighiero S, Cuomo A, Natoli G. Restrictor synergizes with Symplekin and PNUTS to terminate extragenic transcription. Genes Dev 2023; 37:1017-1040. [PMID: 38092518 PMCID: PMC10760643 DOI: 10.1101/gad.351057.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/29/2023] [Indexed: 12/28/2023]
Abstract
Transcription termination pathways mitigate the detrimental consequences of unscheduled promiscuous initiation occurring at hundreds of thousands of genomic cis-regulatory elements. The Restrictor complex, composed of the Pol II-interacting protein WDR82 and the RNA-binding protein ZC3H4, suppresses processive transcription at thousands of extragenic sites in mammalian genomes. Restrictor-driven termination does not involve nascent RNA cleavage, and its interplay with other termination machineries is unclear. Here we show that efficient termination at Restrictor-controlled extragenic transcription units involves the recruitment of the protein phosphatase 1 (PP1) regulatory subunit PNUTS, a negative regulator of the SPT5 elongation factor, and Symplekin, a protein associated with RNA cleavage complexes but also involved in cleavage-independent and phosphatase-dependent termination of noncoding RNAs in yeast. PNUTS and Symplekin act synergistically with, but independently from, Restrictor to dampen processive extragenic transcription. Moreover, the presence of limiting nuclear levels of Symplekin imposes a competition for its recruitment among multiple transcription termination machineries, resulting in mutual regulatory interactions. Hence, by synergizing with Restrictor, Symplekin and PNUTS enable efficient termination of processive, long-range extragenic transcription.
Collapse
Affiliation(s)
- Marta Russo
- Department of Experimental Oncology, European Institute of Oncology (IEO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Milan I-20139, Italy
| | - Viviana Piccolo
- Department of Experimental Oncology, European Institute of Oncology (IEO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Milan I-20139, Italy
| | - Danilo Polizzese
- Department of Experimental Oncology, European Institute of Oncology (IEO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Milan I-20139, Italy
| | - Elena Prosperini
- Department of Experimental Oncology, European Institute of Oncology (IEO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Milan I-20139, Italy
| | - Carolina Borriero
- Department of Experimental Oncology, European Institute of Oncology (IEO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Milan I-20139, Italy
| | - Sara Polletti
- Department of Experimental Oncology, European Institute of Oncology (IEO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Milan I-20139, Italy
| | - Fabio Bedin
- Department of Experimental Oncology, European Institute of Oncology (IEO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Milan I-20139, Italy
| | - Mattia Marenda
- Department of Experimental Oncology, European Institute of Oncology (IEO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Milan I-20139, Italy
| | - Davide Michieletto
- School of Physics and Astronomy, University of Edinburgh, Edinburgh EH9 3FD, United Kingdom
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, United Kingdom
| | - Gaurav Madappa Mandana
- Department of Experimental Oncology, European Institute of Oncology (IEO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Milan I-20139, Italy
| | - Simona Rodighiero
- Department of Experimental Oncology, European Institute of Oncology (IEO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Milan I-20139, Italy
| | - Alessandro Cuomo
- Department of Experimental Oncology, European Institute of Oncology (IEO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Milan I-20139, Italy
| | - Gioacchino Natoli
- Department of Experimental Oncology, European Institute of Oncology (IEO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Milan I-20139, Italy;
| |
Collapse
|
4
|
Flint J, Heffel MG, Chen Z, Mefford J, Marcus E, Chen PB, Ernst J, Luo C. Single-cell methylation analysis of brain tissue prioritizes mutations that alter transcription. Cell Genom 2023; 3:100454. [PMID: 38116123 PMCID: PMC10726494 DOI: 10.1016/j.xgen.2023.100454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/08/2023] [Accepted: 11/06/2023] [Indexed: 12/21/2023]
Abstract
Relating genetic variants to behavior remains a fundamental challenge. To assess the utility of DNA methylation marks in discovering causative variants, we examined their relationship to genetic variation by generating single-nucleus methylomes from the hippocampus of eight inbred mouse strains. At CpG sequence densities under 40 CpG/Kb, cells compensate for loss of methylated sites by methylating additional sites to maintain methylation levels. At higher CpG sequence densities, the exact location of a methylated site becomes more important, suggesting that variants affecting methylation will have a greater effect when occurring in higher CpG densities than in lower. We found this to be true for a variant's effect on transcript abundance, indicating that candidate variants can be prioritized based on CpG sequence density. Our findings imply that DNA methylation influences the likelihood that mutations occur at specific sites in the genome, supporting the view that the distribution of mutations is not random.
Collapse
Affiliation(s)
- Jonathan Flint
- Department of Psychiatry and Biobehavioral Sciences, University of California Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Matthew G. Heffel
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Zeyuan Chen
- Department of Computer Science, Samueli School of Engineering, University of California Los Angeles, Los Angeles, CA, USA
| | - Joel Mefford
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Emilie Marcus
- Department of Biological Chemistry, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Patrick B. Chen
- Department of Psychiatry and Biobehavioral Sciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Jason Ernst
- Department of Computer Science, Samueli School of Engineering, University of California Los Angeles, Los Angeles, CA, USA
- Department of Biological Chemistry, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Chongyuan Luo
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
5
|
Li Y, Ju F, Chen Z, Qu Y, Xia H, He L, Wu L, Zhu J, Shao B, Deng P. CREaTor: zero-shot cis-regulatory pattern modeling with attention mechanisms. Genome Biol 2023; 24:266. [PMID: 37996959 PMCID: PMC10666311 DOI: 10.1186/s13059-023-03103-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 11/03/2023] [Indexed: 11/25/2023] Open
Abstract
Linking cis-regulatory sequences to target genes has been a long-standing challenge. In this study, we introduce CREaTor, an attention-based deep neural network designed to model cis-regulatory patterns for genomic elements up to 2 Mb from target genes. Coupled with a training strategy that predicts gene expression from flanking candidate cis-regulatory elements (cCREs), CREaTor can model cell type-specific cis-regulatory patterns in new cell types without prior knowledge of cCRE-gene interactions or additional training. The zero-shot modeling capability, combined with the use of only RNA-seq and ChIP-seq data, allows for the ready generalization of CREaTor to a broad range of cell types.
Collapse
Affiliation(s)
- Yongge Li
- Microsoft Research AI4Science, Beijing, China
- School of Medicine, Tsinghua University, Beijing, China
| | - Fusong Ju
- Microsoft Research AI4Science, Beijing, China
| | - Zhiyuan Chen
- Microsoft Research AI4Science, Beijing, China
- School of Computing, Australian National University, Canberra, Australia
| | - Yiming Qu
- Microsoft Research AI4Science, Beijing, China
- School of Life Sciences, Tsinghua University, Beijing, China
| | | | - Liang He
- Microsoft Research AI4Science, Beijing, China
| | - Lijun Wu
- Microsoft Research AI4Science, Beijing, China
| | - Jianwei Zhu
- Microsoft Research AI4Science, Beijing, China
| | - Bin Shao
- Microsoft Research AI4Science, Beijing, China
| | - Pan Deng
- Microsoft Research AI4Science, Beijing, China.
| |
Collapse
|
6
|
Zhao J, Baltoumas FA, Konnaris MA, Mouratidis I, Liu Z, Sims J, Agarwal V, Pavlopoulos GA, Georgakopoulos-Soares I, Ahituv N. MPRAbase: A Massively Parallel Reporter Assay Database. bioRxiv 2023:2023.11.19.567742. [PMID: 38045264 PMCID: PMC10690217 DOI: 10.1101/2023.11.19.567742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Massively parallel reporter assays (MPRAs) represent a set of high-throughput technologies that measure the functional effects of thousands of sequences/variants on gene regulatory activity. There are several different variations of MPRA technology and they are used for numerous applications, including regulatory element discovery, variant effect measurement, saturation mutagenesis, synthetic regulatory element generation or characterization of evolutionary gene regulatory differences. Despite their many designs and uses, there is no comprehensive database that incorporates the results of these experiments. To address this, we developed MPRAbase, a manually curated database that currently harbors 129 experiments, encompassing 17,718,677 elements tested across 35 cell types and 4 organisms. The MPRAbase web interface ( http://www.mprabase.com ) serves as a centralized user-friendly repository to download existing MPRA data for independent analysis and is designed with the ability to allow researchers to share their published data for rapid dissemination to the community.
Collapse
|
7
|
Trauernicht M, Rastogi C, Manzo S, Bussemaker H, van Steensel B. Optimisation of TP53 reporters by systematic dissection of synthetic TP53 response elements. Nucleic Acids Res 2023; 51:9690-9702. [PMID: 37650627 PMCID: PMC10570033 DOI: 10.1093/nar/gkad718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 07/24/2023] [Accepted: 08/22/2023] [Indexed: 09/01/2023] Open
Abstract
TP53 is a transcription factor that controls multiple cellular processes, including cell cycle arrest, DNA repair and apoptosis. The relation between TP53 binding site architecture and transcriptional output is still not fully understood. Here, we systematically examined in three different cell lines the effects of binding site affinity and copy number on TP53-dependent transcriptional output, and also probed the impact of spacer length and sequence between adjacent binding sites, and of core promoter identity. Paradoxically, we found that high-affinity TP53 binding sites are less potent than medium-affinity sites. TP53 achieves supra-additive transcriptional activation through optimally spaced adjacent binding sites, suggesting a cooperative mechanism. Optimally spaced adjacent binding sites have a ∼10-bp periodicity, suggesting a role for spatial orientation along the DNA double helix. We leveraged these insights to construct a log-linear model that explains activity from sequence features, and to identify new highly active and sensitive TP53 reporters.
Collapse
Affiliation(s)
- Max Trauernicht
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Chaitanya Rastogi
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Stefano G Manzo
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Department of Biosciences, University of Milan “La Statale”, 20133 Milan, Italy
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Bas van Steensel
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
- Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| |
Collapse
|
8
|
Antontseva EV, Degtyareva AO, Korbolina EE, Damarov IS, Merkulova TI. Human-genome single nucleotide polymorphisms affecting transcription factor binding and their role in pathogenesis. Vavilovskii Zhurnal Genet Selektsii 2023; 27:662-675. [PMID: 37965371 PMCID: PMC10641029 DOI: 10.18699/vjgb-23-77] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 03/24/2023] [Accepted: 03/30/2023] [Indexed: 11/16/2023] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most common type of variation in the human genome. The vast majority of SNPs identified in the human genome do not have any effect on the phenotype; however, some can lead to changes in the function of a gene or the level of its expression. Most SNPs associated with certain traits or pathologies are mapped to regulatory regions of the genome and affect gene expression by changing transcription factor binding sites. In recent decades, substantial effort has been invested in searching for such regulatory SNPs (rSNPs) and understanding the mechanisms by which they lead to phenotypic differences, primarily to individual differences in susceptibility to diseases and in sensitivity to drugs. The development of the NGS (next-generation sequencing) technology has contributed not only to the identification of a huge number of SNPs and to the search for their association (genome-wide association studies, GWASs) with certain diseases or phenotypic manifestations, but also to the development of more productive approaches to their functional annotation. It should be noted that the presence of an association does not allow one to identify a functional, truly disease-associated DNA sequence variant among multiple marker SNPs that are detected due to linkage disequilibrium. Moreover, determination of associations of genetic variants with a disease does not provide information about the functionality of these variants, which is necessary to elucidate the molecular mechanisms of the development of pathology and to design effective methods for its treatment and prevention. In this regard, the functional analysis of SNPs annotated in the GWAS catalog, both at the genome-wide level and at the level of individual SNPs, became especially relevant in recent years. A genome-wide search for potential rSNPs is possible without any prior knowledge of their association with a trait. Thus, mapping expression quantitative trait loci (eQTLs) makes it possible to identify an SNP for which - among transcriptomes of homozygotes and heterozygotes for its various alleles - there are differences in the expression level of certain genes, which can be located at various distances from the SNP. To predict rSNPs, approaches based on searches for allele-specific events in RNA-seq, ChIP-seq, DNase-seq, ATAC-seq, MPRA, and other data are also used. Nonetheless, for a more complete functional annotation of such rSNPs, it is necessary to establish their association with a trait, in particular, with a predisposition to a certain pathology or sensitivity to drugs. Thus, approaches to finding SNPs important for the development of a trait can be categorized into two groups: (1) starting from data on an association of SNPs with a certain trait, (2) starting from the determination of allele-specific changes at the molecular level (in a transcriptome or regulome). Only comprehensive use of strategically different approaches can considerably enrich our knowledge about the role of genetic determinants in the molecular mechanisms of trait formation, including predisposition to multifactorial diseases.
Collapse
Affiliation(s)
- E V Antontseva
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - A O Degtyareva
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - E E Korbolina
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - I S Damarov
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - T I Merkulova
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
9
|
Kleinschmidt H, Xu C, Bai L. Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 2023; 132:167-189. [PMID: 37184694 PMCID: PMC10542970 DOI: 10.1007/s00412-023-00796-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/16/2023]
Abstract
Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.
Collapse
Affiliation(s)
- Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Physics, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
10
|
Guzman C, Duttke S, Zhu Y, De Arruda Saldanha C, Downes N, Benner C, Heinz S. Combining TSS-MPRA and sensitive TSS profile dissimilarity scoring to study the sequence determinants of transcription initiation. Nucleic Acids Res 2023; 51:e80. [PMID: 37403796 PMCID: PMC10450201 DOI: 10.1093/nar/gkad562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/13/2023] [Accepted: 06/20/2023] [Indexed: 07/06/2023] Open
Abstract
Cis-regulatory elements (CREs) can be classified by the shapes of their transcription start site (TSS) profiles, which are indicative of distinct regulatory mechanisms. Massively parallel reporter assays (MPRAs) are increasingly being used to study CRE regulatory mechanisms, yet the degree to which MPRAs replicate individual endogenous TSS profiles has not been determined. Here, we present a new low-input MPRA protocol (TSS-MPRA) that enables measuring TSS profiles of episomal reporters as well as after lentiviral reporter chromatinization. To sensitively compare MPRA and endogenous TSS profiles, we developed a novel dissimilarity scoring algorithm (WIP score) that outperforms the frequently used earth mover's distance on experimental data. Using TSS-MPRA and WIP scoring on 500 unique reporter inserts, we found that short (153 bp) MPRA promoter inserts replicate the endogenous TSS patterns of ∼60% of promoters. Lentiviral reporter chromatinization did not improve fidelity of TSS-MPRA initiation patterns, and increasing insert size frequently led to activation of extraneous TSS in the MPRA that are not active in vivo. We discuss the implications of our findings, which highlight important caveats when using MPRAs to study transcription mechanisms. Finally, we illustrate how TSS-MPRA and WIP scoring can provide novel insights into the impact of transcription factor motif mutations and genetic variants on TSS patterns and transcription levels.
Collapse
Affiliation(s)
- Carlos Guzman
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
- Department of Bioengineering, Graduate Program in Bioinformatics & Systems Biology, U.C. San Diego, La Jolla, CA 92093, USA
| | - Sascha Duttke
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Yixin Zhu
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Camila De Arruda Saldanha
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Nicholas L Downes
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Christopher Benner
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Sven Heinz
- Department of Medicine, Division of Endocrinology, U.C. San Diego School of Medicine, La Jolla, CA 92093, USA
| |
Collapse
|
11
|
FitzPatrick VD, Leemans C, van Arensbergen J, van Steensel B, Bussemaker H. Defining the fine structure of promoter activity on a genome-wide scale with CISSECTOR. Nucleic Acids Res 2023; 51:5499-5511. [PMID: 37013986 PMCID: PMC10287907 DOI: 10.1093/nar/gkad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 03/08/2023] [Accepted: 03/22/2023] [Indexed: 04/05/2023] Open
Abstract
Classic promoter mutagenesis strategies can be used to study how proximal promoter regions regulate the expression of particular genes of interest. This is a laborious process, in which the smallest sub-region of the promoter still capable of recapitulating expression in an ectopic setting is first identified, followed by targeted mutation of putative transcription factor binding sites. Massively parallel reporter assays such as survey of regulatory elements (SuRE) provide an alternative way to study millions of promoter fragments in parallel. Here we show how a generalized linear model (GLM) can be used to transform genome-scale SuRE data into a high-resolution genomic track that quantifies the contribution of local sequence to promoter activity. This coefficient track helps identify regulatory elements and can be used to predict promoter activity of any sub-region in the genome. It thus allows in silico dissection of any promoter in the human genome to be performed. We developed a web application, available at cissector.nki.nl, that lets researchers easily perform this analysis as a starting point for their research into any promoter of interest.
Collapse
Affiliation(s)
- Vincent D FitzPatrick
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Christ Leemans
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Joris van Arensbergen
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Bas van Steensel
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
- Department of Cell Biology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
12
|
Hussain S, Sadouni N, van Essen D, Dao LTM, Ferré Q, Charbonnier G, Torres M, Gallardo F, Lecellier CH, Sexton T, Saccani S, Spicuglia S. Short tandem repeats are important contributors to silencer elements in T cells. Nucleic Acids Res 2023; 51:4845-4866. [PMID: 36929452 PMCID: PMC10250210 DOI: 10.1093/nar/gkad187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 02/26/2023] [Accepted: 03/15/2023] [Indexed: 03/18/2023] Open
Abstract
The action of cis-regulatory elements with either activation or repression functions underpins the precise regulation of gene expression during normal development and cell differentiation. Gene activation by the combined activities of promoters and distal enhancers has been extensively studied in normal and pathological contexts. In sharp contrast, gene repression by cis-acting silencers, defined as genetic elements that negatively regulate gene transcription in a position-independent fashion, is less well understood. Here, we repurpose the STARR-seq approach as a novel high-throughput reporter strategy to quantitatively assess silencer activity in mammals. We assessed silencer activity from DNase hypersensitive I sites in a mouse T cell line. Identified silencers were associated with either repressive or active chromatin marks and enriched for binding motifs of known transcriptional repressors. CRISPR-mediated genomic deletions validated the repressive function of distinct silencers involved in the repression of non-T cell genes and genes regulated during T cell differentiation. Finally, we unravel an association of silencer activity with short tandem repeats, highlighting the role of repetitive elements in silencer activity. Our results provide a general strategy for genome-wide identification and characterization of silencer elements.
Collapse
Affiliation(s)
- Saadat Hussain
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Nori Sadouni
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Dominic van Essen
- Institute for Research on Cancer and Ageing, IRCAN, 06107 Nice, France
| | - Lan T M Dao
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Quentin Ferré
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Guillaume Charbonnier
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Magali Torres
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Frederic Gallardo
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| | - Charles-Henri Lecellier
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
- LIRMM, University of Montpellier, CNRS, Montpellier, France
| | - Tom Sexton
- Institut de Génétique et de Biologie Moléculaire et Cellulaire – IGBMC (CNRS UMR 7104, INSERM U1258, Université de Strasbourg), 67404 Illkirch, France
| | - Simona Saccani
- Institute for Research on Cancer and Ageing, IRCAN, 06107 Nice, France
| | - Salvatore Spicuglia
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, Marseille, France
| |
Collapse
|
13
|
Fabo T, Khavari P. Functional characterization of human genomic variation linked to polygenic diseases. Trends Genet 2023; 39:462-490. [PMID: 36997428 PMCID: PMC11025698 DOI: 10.1016/j.tig.2023.02.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 02/22/2023] [Accepted: 02/23/2023] [Indexed: 03/30/2023]
Abstract
The burden of human disease lies predominantly in polygenic diseases. Since the early 2000s, genome-wide association studies (GWAS) have identified genetic variants and loci associated with complex traits. These have ranged from variants in coding sequences to mutations in regulatory regions, such as promoters and enhancers, as well as mutations affecting mediators of mRNA stability and other downstream regulators, such as 5' and 3'-untranslated regions (UTRs), long noncoding RNA (lncRNA), and miRNA. Recent research advances in genetics have utilized a combination of computational techniques, high-throughput in vitro and in vivo screening modalities, and precise genome editing to impute the function of diverse classes of genetic variants identified through GWAS. In this review, we highlight the vastness of genomic variants associated with polygenic disease risk and address recent advances in how genetic tools can be used to functionally characterize them.
Collapse
Affiliation(s)
- Tania Fabo
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA; Stanford Cancer Institute, Stanford University, Stanford, CA, USA; Graduate Program in Genetics, Stanford University, Stanford, CA, USA; Stanford University School of Medicine, Stanford University, Stanford, CA, USA
| | - Paul Khavari
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA; Stanford Cancer Institute, Stanford University, Stanford, CA, USA; Graduate Program in Genetics, Stanford University, Stanford, CA, USA; Stanford University School of Medicine, Stanford University, Stanford, CA, USA; Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA.
| |
Collapse
|
14
|
Schofield JA, Hahn S. Broad compatibility between yeast UAS elements and core promoters and identification of promoter elements that determine cofactor specificity. Cell Rep 2023; 42:112387. [PMID: 37058407 PMCID: PMC10567116 DOI: 10.1016/j.celrep.2023.112387] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 01/30/2023] [Accepted: 03/28/2023] [Indexed: 04/15/2023] Open
Abstract
Three classes of yeast protein-coding genes are distinguished by their dependence on the transcription cofactors TFIID, SAGA, and Mediator (MED) Tail, but whether this dependence is determined by the core promoter, upstream activating sequences (UASs), or other gene features is unclear. Also unclear is whether UASs can broadly activate transcription from the different promoter classes. Here, we measure transcription and cofactor specificity for thousands of UAS-core promoter combinations and find that most UASs broadly activate promoters regardless of regulatory class, while few display strong promoter specificity. However, matching UASs and promoters from the same gene class is generally important for optimal expression. We find that sensitivity to rapid depletion of MED Tail or SAGA is dependent on the identity of both UAS and core promoter, while dependence on TFIID localizes to only the promoter. Finally, our results suggest the role of TATA and TATA-like promoter sequences in MED Tail function.
Collapse
Affiliation(s)
- Jeremy A Schofield
- Basic Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Avenue N, Seattle, WA 98105, USA
| | - Steven Hahn
- Basic Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Avenue N, Seattle, WA 98105, USA.
| |
Collapse
|
15
|
Ren N, Dai S, Ma S, Yang F. Strategies for activity analysis of single nucleotide polymorphisms associated with human diseases. Clin Genet 2023; 103:392-400. [PMID: 36527336 DOI: 10.1111/cge.14282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/10/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
Genome-wide association studies (GWAS) have identified a large number of single nucleotide polymorphism (SNP) sites associated with human diseases. In the annotation of human diseases, especially cancers, SNPs, as an important component of genetic factors, have gained increasing attention. Given that most of the SNPs are located in non-coding regions, the functional verification of these SNPs is a great challenge. The key to functional annotation for risk SNPs is to screen SNPs with regulatory activity from thousands of disease associated-SNPs. In this review, we systematically recapitulate the characteristics and functional roles of SNP sites, discuss three parallel reporter screening strategies in detail based on barcode tag classification, and recommend the common in silico strategies to help supplement the annotation of SNP sites with epigenetic activity analysis, prediction of target genes and trans-acting factors. We hope that this review will contribute to this exuberant research field by providing robust activity analysis strategies that can facilitate the translation of GWAS results into personalized diagnosis and prevention measures for human diseases.
Collapse
Affiliation(s)
- Naixia Ren
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shangkun Dai
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shumin Ma
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
| | - Fengtang Yang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| |
Collapse
|
16
|
Agarwal V, Inoue F, Schubach M, Martin BK, Dash PM, Zhang Z, Sohota A, Noble WS, Yardimci GG, Kircher M, Shendure J, Ahituv N. Massively parallel characterization of transcriptional regulatory elements in three diverse human cell types. bioRxiv 2023:2023.03.05.531189. [PMID: 36945371 PMCID: PMC10028905 DOI: 10.1101/2023.03.05.531189] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
The human genome contains millions of candidate cis-regulatory elements (CREs) with cell-type-specific activities that shape both health and myriad disease states. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these CREs. Here, we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of over 680,000 sequences, representing a nearly comprehensive set of all annotated CREs among three cell types (HepG2, K562, and WTC11), finding 41.7% to be functional. By testing sequences in both orientations, we find promoters to have significant strand orientation effects. We also observe that their 200 nucleotide cores function as non-cell-type-specific 'on switches' providing similar expression levels to their associated gene. In contrast, enhancers have weaker orientation effects, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict CRE function with high accuracy and delineate regulatory motifs. Testing an additional lentiMPRA library encompassing 60,000 CREs in all three cell types, we further identified factors that determine cell-type specificity. Collectively, our work provides an exhaustive catalog of functional CREs in three widely used cell lines, and showcases how large-scale functional measurements can be used to dissect regulatory grammar.
Collapse
Affiliation(s)
- Vikram Agarwal
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- mRNA Center of Excellence, Sanofi Pasteur Inc., Waltham, MA 02451, USA
| | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Max Schubach
- Berlin Institute of Health of Health at Charité - Universitätsmedizin Berlin, 10178, Berlin, Germany
| | - Beth K. Martin
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Pyaree Mohan Dash
- Berlin Institute of Health of Health at Charité - Universitätsmedizin Berlin, 10178, Berlin, Germany
| | - Zicong Zhang
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Ajuni Sohota
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Galip Gürkan Yardimci
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA
- Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland, OR, USA
| | - Martin Kircher
- Berlin Institute of Health of Health at Charité - Universitätsmedizin Berlin, 10178, Berlin, Germany
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, Seattle, WA 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA
- Allen Center for Cell Lineage Tracing, University of Washington, Seattle, WA 98195, USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
17
|
Reddy AJ, Herschl MH, Kolli S, Lu AX, Geng X, Kumar A, Hsu PD, Levine S, Ioannidis NM. Pretraining strategies for effective promoter-driven gene expression prediction. bioRxiv 2023:2023.02.24.529941. [PMID: 36909524 PMCID: PMC10002662 DOI: 10.1101/2023.02.24.529941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]
Abstract
Advances in gene delivery technologies are enabling rapid progress in molecular medicine, but require precise expression of genetic cargo in desired cell types, which is predominantly achieved via a regulatory DNA sequence called a promoter; however, only a handful of cell type-specific promoters are known. Efficiently designing compact promoter sequences with a high density of regulatory information by leveraging machine learning models would therefore be broadly impactful for fundamental research and direct therapeutic applications. However, models of expression from such compact promoter sequences are lacking, despite the recent success of deep learning in modelling expression from endogenous regulatory sequences. Despite the lack of large datasets measuring promoter-driven expression in many cell types, data from a few well-studied cell types or from endogenous gene expression may provide relevant information for transfer learning, which has not yet been explored in this setting. Here, we evaluate a variety of pretraining tasks and transfer strategies for modelling cell type-specific expression from compact promoters and demonstrate the effectiveness of pretraining on existing promoter-driven expression datasets from other cell types. Our approach is broadly applicable for modelling promoter-driven expression in any data-limited cell type of interest, and will enable the use of model-based optimization techniques for promoter design for gene delivery applications. Our code and data are available at https://github.com/anikethjr/promoter_models.
Collapse
|
18
|
Gallego Romero I, Lea AJ. Leveraging massively parallel reporter assays for evolutionary questions. Genome Biol 2023; 24:26. [PMID: 36788564 PMCID: PMC9926830 DOI: 10.1186/s13059-023-02856-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 01/17/2023] [Indexed: 02/16/2023] Open
Abstract
A long-standing goal of evolutionary biology is to decode how gene regulation contributes to organismal diversity. Doing so is challenging because it is hard to predict function from non-coding sequence and to perform molecular research with non-model taxa. Massively parallel reporter assays (MPRAs) enable the testing of thousands to millions of sequences for regulatory activity simultaneously. Here, we discuss the execution, advantages, and limitations of MPRAs, with a focus on evolutionary questions. We propose solutions for extending MPRAs to rare taxa and those with limited genomic resources, and we underscore MPRA's broad potential for driving genome-scale, functional studies across organisms.
Collapse
Affiliation(s)
- Irene Gallego Romero
- Melbourne Integrative Genomics, University of Melbourne, Royal Parade, Parkville, Victoria, 3010, Australia. .,School of BioSciences, The University of Melbourne, Royal Parade, Parkville, 3010, Australia. .,The Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, 30 Royal Parade, Parkville, Victoria, 3010, Australia. .,Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Estonia.
| | - Amanda J. Lea
- grid.152326.10000 0001 2264 7217Department of Biological Sciences, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37240 USA ,Child and Brain Development Program, Canadian Institute for Advanced Study, Toronto, Canada
| |
Collapse
|
19
|
Romanov SE, Laktionov PP. Аpplication of massive parallel reporter analysis in biotechnology and medicine. Journal of Clinical Practice 2023. [DOI: 10.17816/clinpract115063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The development and functioning of an organism relies on tissue-specific gene programs. Genome regulatory elements play a key role in the regulation of such programs, and disruptions in their function can lead to the development of various pathologies, including cancers, malformations and autoimmune diseases. The emergence of high-throughput genomic studies has led to massively parallel reporter analysis (MPRA) methods, which allow the functional verification and identification of regulatory elements on a genome-wide scale. Initially MPRA was used as a tool to investigate fundamental aspects of epigenetics, but the approach also has great potential for clinical and practical biotechnology. Currently, MPRA is used for validation of clinically significant mutations, identification of tissue-specific regulatory elements, search for the most promising loci for transgene integration, and is an indispensable tool for creating highly efficient expression systems, the range of application of which extends from approaches for protein development and design of next-generation therapeutic antibody superproducers to gene therapy. In this review, the main principles and areas of practical application of high-throughput reporter assays will be discussed.
Collapse
|
20
|
Cooper YA, Guo Q, Geschwind DH. Multiplexed functional genomic assays to decipher the noncoding genome. Hum Mol Genet 2022; 31:R84-R96. [PMID: 36057282 PMCID: PMC9585676 DOI: 10.1093/hmg/ddac194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/14/2022] Open
Abstract
Linkage disequilibrium and the incomplete regulatory annotation of the noncoding genome complicates the identification of functional noncoding genetic variants and their causal association with disease. Current computational methods for variant prioritization have limited predictive value, necessitating the application of highly parallelized experimental assays to efficiently identify functional noncoding variation. Here, we summarize two distinct approaches, massively parallel reporter assays and CRISPR-based pooled screens and describe their flexible implementation to characterize human noncoding genetic variation at unprecedented scale. Each approach provides unique advantages and limitations, highlighting the importance of multimodal methodological integration. These multiplexed assays of variant effects are undoubtedly poised to play a key role in the experimental characterization of noncoding genetic risk, informing our understanding of the underlying mechanisms of disease-associated loci and the development of more robust predictive classification algorithms.
Collapse
Affiliation(s)
- Yonatan A Cooper
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Medical Scientist Training Program, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Qiuyu Guo
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Program in Neurogenetics, Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, University of California Los Angeles, Los Angeles, CA, USA
- Institute of Precision Health, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
21
|
Yang Y, Shao Y, Chaffin TA, Lee JH, Poindexter MR, Ahkami AH, Blumwald E, Stewart CN. Performance of abiotic stress-inducible synthetic promoters in genetically engineered hybrid poplar ( Populus tremula × Populus alba). Front Plant Sci 2022; 13:1011939. [PMID: 36330242 PMCID: PMC9623294 DOI: 10.3389/fpls.2022.1011939] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 09/28/2022] [Indexed: 05/27/2023]
Abstract
Abiotic stresses can cause significant damage to plants. For sustainable bioenergy crop production, it is critical to generate resistant crops to such stress. Engineering promoters to control the precise expression of stress resistance genes is a very effective way to address the problem. Here we developed stably transformed Populus tremula × Populus alba hybrid poplar (INRA 717-1B4) containing one-of-six synthetic drought stress-inducible promoters (SDs; SD9-1, SD9-2, SD9-3, SD13-1, SD18-1, and SD18-3) identified previously by transient transformation assays. We screened green fluorescent protein (GFP) induction in poplar under osmotic stress conditions. Of six transgenic lines containing synthetic promoter, three lines (SD18-1, 9-2, and 9-3) had significant GFP expression in both salt and osmotic stress treatments. Each synthetic promoter employed heptamerized repeats of specific and short cis-regulatory elements (7 repeats of 7-8 bases). To verify whether the repeats of longer sequences can improve osmotic stress responsiveness, a transgenic poplar containing the synthetic promoter of the heptamerized entire SD9 motif (20 bases, containing all partial SD9 motifs) was generated and measured for GFP induction under osmotic stress. The heptamerized entire SD9 motif did not result in higher GFP expression than the shorter promoters consisting of heptamerized SD9-1, 9-2, and 9-3 (partial SD9) motifs. This result indicates that shorter synthetic promoters (~50 bp) can be used for versatile control of gene expression in transgenic poplar. These synthetic promoters will be useful tools to engineer stress-resilient bioenergy tree crops in the future.
Collapse
Affiliation(s)
- Yongil Yang
- Center for Agricultural Synthetic Biology, University of Tennessee Institute of Agriculture, Knoxville, TN, United States
- Department of Plant Sciences, University of Tennessee Institute of Agriculture, Knoxville, TN, United States
| | - Yuanhua Shao
- Center for Agricultural Synthetic Biology, University of Tennessee Institute of Agriculture, Knoxville, TN, United States
- Department of Plant Sciences, University of Tennessee Institute of Agriculture, Knoxville, TN, United States
| | - Timothy A. Chaffin
- Center for Agricultural Synthetic Biology, University of Tennessee Institute of Agriculture, Knoxville, TN, United States
- Department of Plant Sciences, University of Tennessee Institute of Agriculture, Knoxville, TN, United States
| | - Jun Hyung Lee
- Center for Agricultural Synthetic Biology, University of Tennessee Institute of Agriculture, Knoxville, TN, United States
- Department of Plant Sciences, University of Tennessee Institute of Agriculture, Knoxville, TN, United States
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Magen R. Poindexter
- Center for Agricultural Synthetic Biology, University of Tennessee Institute of Agriculture, Knoxville, TN, United States
- Department of Plant Sciences, University of Tennessee Institute of Agriculture, Knoxville, TN, United States
| | - Amir H. Ahkami
- Environmental Molecular Sciences Laboratory (EMSL), Pacific Northwest National Laboratory (PNNL), Richland, WA, United States
| | - Eduardo Blumwald
- Department of Plant Sciences, University of California, Davis, Davis, CA, United States
| | - C. Neal Stewart
- Center for Agricultural Synthetic Biology, University of Tennessee Institute of Agriculture, Knoxville, TN, United States
- Department of Plant Sciences, University of Tennessee Institute of Agriculture, Knoxville, TN, United States
| |
Collapse
|
22
|
Bergman DT, Jones TR, Liu V, Ray J, Jagoda E, Siraj L, Kang HY, Nasser J, Kane M, Rios A, Nguyen TH, Grossman SR, Fulco CP, Lander ES, Engreitz JM. Compatibility rules of human enhancer and promoter sequences. Nature 2022; 607:176-184. [PMID: 35594906 PMCID: PMC9262863 DOI: 10.1038/s41586-022-04877-w] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 05/17/2022] [Indexed: 01/03/2023]
Abstract
Gene regulation in the human genome is controlled by distal enhancers that activate specific nearby promoters1. A proposed model for this specificity is that promoters have sequence-encoded preferences for certain enhancers, for example, mediated by interacting sets of transcription factors or cofactors2. This 'biochemical compatibility' model has been supported by observations at individual human promoters and by genome-wide measurements in Drosophila3-9. However, the degree to which human enhancers and promoters are intrinsically compatible has not yet been systematically measured, and how their activities combine to control RNA expression remains unclear. Here we design a high-throughput reporter assay called enhancer × promoter self-transcribing active regulatory region sequencing (ExP STARR-seq) and applied it to examine the combinatorial compatibilities of 1,000 enhancer and 1,000 promoter sequences in human K562 cells. We identify simple rules for enhancer-promoter compatibility, whereby most enhancers activate all promoters by similar amounts, and intrinsic enhancer and promoter activities multiplicatively combine to determine RNA output (R2 = 0.82). In addition, two classes of enhancers and promoters show subtle preferential effects. Promoters of housekeeping genes contain built-in activating motifs for factors such as GABPA and YY1, which decrease the responsiveness of promoters to distal enhancers. Promoters of variably expressed genes lack these motifs and show stronger responsiveness to enhancers. Together, this systematic assessment of enhancer-promoter compatibility suggests a multiplicative model tuned by enhancer and promoter class to control gene transcription in the human genome.
Collapse
Affiliation(s)
- Drew T Bergman
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | | | - Vincent Liu
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Judhajeet Ray
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Evelyn Jagoda
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Layla Siraj
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Biophysics Graduate Program, Harvard University, Cambridge, MA, USA
| | - Helen Y Kang
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- BASE Initiative, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Stanford University School of Medicine, Stanford, CA, USA
| | - Joseph Nasser
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael Kane
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Antonio Rios
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Tung H Nguyen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Charles P Fulco
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Bristol Myers Squibb, Cambridge, MA, USA
| | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Jesse M Engreitz
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
- BASE Initiative, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
23
|
Shukla V, Cetnarowska A, Hyldahl M, Mandrup S. Interplay between regulatory elements and chromatin topology in cellular lineage determination. Trends Genet 2022. [DOI: 10.1016/j.tig.2022.05.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 05/02/2022] [Accepted: 05/12/2022] [Indexed: 11/16/2022]
|
24
|
Grishin D, Gusev A. Allelic imbalance of chromatin accessibility in cancer identifies candidate causal risk variants and their mechanisms. Nat Genet 2022; 54:837-849. [PMID: 35697866 PMCID: PMC9886437 DOI: 10.1038/s41588-022-01075-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 04/08/2022] [Indexed: 02/02/2023]
Abstract
While many germline cancer risk variants have been identified through genome-wide association studies (GWAS), the mechanisms by which these variants operate remain largely unknown. Here we used 406 cancer ATAC-Seq samples across 23 cancer types to identify 7,262 germline allele-specific accessibility QTLs (as-aQTLs). Cancer as-aQTLs had stronger enrichment for cancer risk heritability (up to 145 fold) than any other functional annotation across seven cancer GWAS. Most cancer as-aQTLs directly altered transcription factor (TF) motifs and exhibited differential TF binding and gene expression in functional screens. To connect as-aQTLs to putative risk mechanisms, we introduced the regulome-wide associations study (RWAS). RWAS identified genetically associated accessible peaks at >70% of known breast and prostate loci and discovered new risk loci in all examined cancer types. Integrating as-aQTL discovery, motif analysis and RWAS identified candidate causal regulatory elements and their probable upstream regulators. Our work establishes cancer as-aQTLs and RWAS analysis as powerful tools to study the genetic architecture of cancer risk.
Collapse
Affiliation(s)
- Dennis Grishin
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Alexander Gusev
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA. .,The Eli and Edythe L. Broad Institute, Cambridge, MA, USA. .,Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA.
| |
Collapse
|
25
|
Martinez-Ara M, Comoglio F, van Arensbergen J, van Steensel B. Systematic analysis of intrinsic enhancer-promoter compatibility in the mouse genome. Mol Cell 2022; 82:2519-2531.e6. [PMID: 35594855 PMCID: PMC9278412 DOI: 10.1016/j.molcel.2022.04.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 02/17/2022] [Accepted: 04/05/2022] [Indexed: 12/12/2022]
Affiliation(s)
- Miguel Martinez-Ara
- Division of Gene Regulation and Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, the Netherlands
| | - Federico Comoglio
- Division of Gene Regulation and Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, the Netherlands
| | - Joris van Arensbergen
- Division of Gene Regulation and Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, the Netherlands
| | - Bas van Steensel
- Division of Gene Regulation and Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, the Netherlands.
| |
Collapse
|
26
|
Sahu B, Hartonen T, Pihlajamaa P, Wei B, Dave K, Zhu F, Kaasinen E, Lidschreiber K, Lidschreiber M, Daub CO, Cramer P, Kivioja T, Taipale J. Sequence determinants of human gene regulatory elements. Nat Genet. [PMID: 35190730 PMCID: PMC8920891 DOI: 10.1038/s41588-021-01009-4] [Citation(s) in RCA: 58] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 12/17/2021] [Indexed: 01/02/2023]
Abstract
DNA can determine where and when genes are expressed, but the full set of sequence determinants that control gene expression is unknown. Here, we measured the transcriptional activity of DNA sequences that represent an ~100 times larger sequence space than the human genome using massively parallel reporter assays (MPRAs). Machine learning models revealed that transcription factors (TFs) generally act in an additive manner with weak grammar and that most enhancers increase expression from a promoter by a mechanism that does not appear to involve specific TF–TF interactions. The enhancers themselves can be classified into three types: classical, closed chromatin and chromatin dependent. We also show that few TFs are strongly active in a cell, with most activities being similar between cell types. Individual TFs can have multiple gene regulatory activities, including chromatin opening and enhancing, promoting and determining transcription start site (TSS) activity, consistent with the view that the TF binding motif is the key atomic unit of gene expression. Analysis of massively parallel reporter assays measuring the transcriptional activity of DNA sequences indicates that most transcription factor (TF) activity is additive and does not rely on specific TF–TF interactions. Individual TFs can have different gene regulatory activities.
Collapse
|
27
|
Yao L, Liang J, Ozer A, Leung AK, Lis JT, Yu H. A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers. Nat Biotechnol 2022. [PMID: 35177836 DOI: 10.1038/s41587-022-01211-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 01/06/2022] [Indexed: 01/15/2023]
Abstract
Mounting evidence supports the idea that transcriptional patterns serve as more specific identifiers of active enhancers than histone marks; however, the optimal strategy to identify active enhancers both experimentally and computationally has not been determined. Here, we compared 13 genome-wide RNA sequencing assays in K562 cells and showed that the nuclear run-on followed by cap-selection assay (GRO/PRO-cap) has advantages in eRNA detection and active enhancer identification. We also introduced a tool, Peak Identifier for Nascent Transcript Starts (PINTS), to identify active promoters and enhancers genome-wide and pinpoint the precise location of the 5′ transcription start sites. Finally, we compiled a comprehensive enhancer candidate compendium based on the detected eRNA TSSs available in 120 cell and tissue types that can be accessed at https://pints.yulab.org. With the knowledge of the best available assays and pipelines, this large-scale annotation of candidate enhancers will pave the way for selection and characterization of their functions in a time- and labor-efficient manner in the future.
Collapse
|
28
|
Qi Z, Jung C, Bandilla P, Ludwig C, Heron M, Sophie Kiesel A, Museridze M, Philippou‐Massier J, Nikolov M, Renna Max Schnepf A, Unnerstall U, Ceolin S, Mühlig B, Gompel N, Soeding J, Gaul U. Large‐scale analysis of
Drosophila
core promoter function using synthetic promoters. Mol Syst Biol 2022; 18:e9816. [PMID: 35156763 PMCID: PMC8842121 DOI: 10.15252/msb.20209816] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 01/11/2022] [Accepted: 01/13/2022] [Indexed: 02/02/2023] Open
Abstract
The core promoter plays a central role in setting metazoan gene expression levels, but how exactly it “computes” expression remains poorly understood. To dissect its function, we carried out a comprehensive structure–function analysis in Drosophila. First, we performed a genome‐wide bioinformatic analysis, providing an improved picture of the sequence motifs architecture. We then measured synthetic promoters’ activities of ~3,000 mutational variants with and without an external stimulus (hormonal activation), at large scale and with high accuracy using robotics and a dual luciferase reporter assay. We observed a strong impact on activity of the different types of mutations, including knockout of individual sequence motifs and motif combinations, variations of motif strength, nucleosome positioning, and flanking sequences. A linear combination of the individual motif features largely accounts for the combinatorial effects on core promoter activity. These findings shed new light on the quantitative assessment of gene expression in metazoans.
Collapse
Affiliation(s)
- Zhan Qi
- Department of Biochemistry, Gene Center Ludwig‐Maximillians‐Universität München Feodor‐Lynen‐str 25 Munich Germany
| | - Christophe Jung
- Department of Biochemistry, Gene Center Ludwig‐Maximillians‐Universität München Feodor‐Lynen‐str 25 Munich Germany
| | - Peter Bandilla
- Department of Biochemistry, Gene Center Ludwig‐Maximillians‐Universität München Feodor‐Lynen‐str 25 Munich Germany
| | - Claudia Ludwig
- Department of Biochemistry, Gene Center Ludwig‐Maximillians‐Universität München Feodor‐Lynen‐str 25 Munich Germany
| | - Mark Heron
- Department of Biochemistry, Gene Center Ludwig‐Maximillians‐Universität München Feodor‐Lynen‐str 25 Munich Germany
| | - Anja Sophie Kiesel
- Department of Biochemistry, Gene Center Ludwig‐Maximillians‐Universität München Feodor‐Lynen‐str 25 Munich Germany
| | - Mariam Museridze
- Department of Biology II, Evolutionary Biology Ludwig‐Maximilians‐Universität München Planegg‐Martinsried Germany
| | - Julia Philippou‐Massier
- Department of Biochemistry, Gene Center Ludwig‐Maximillians‐Universität München Feodor‐Lynen‐str 25 Munich Germany
| | - Miroslav Nikolov
- Department of Biochemistry, Gene Center Ludwig‐Maximillians‐Universität München Feodor‐Lynen‐str 25 Munich Germany
| | - Alessio Renna Max Schnepf
- Department of Biochemistry, Gene Center Ludwig‐Maximillians‐Universität München Feodor‐Lynen‐str 25 Munich Germany
| | - Ulrich Unnerstall
- Department of Biochemistry, Gene Center Ludwig‐Maximillians‐Universität München Feodor‐Lynen‐str 25 Munich Germany
| | - Stefano Ceolin
- Department of Biology II, Evolutionary Biology Ludwig‐Maximilians‐Universität München Planegg‐Martinsried Germany
| | - Bettina Mühlig
- Department of Biology II, Evolutionary Biology Ludwig‐Maximilians‐Universität München Planegg‐Martinsried Germany
| | - Nicolas Gompel
- Department of Biology II, Evolutionary Biology Ludwig‐Maximilians‐Universität München Planegg‐Martinsried Germany
| | - Johannes Soeding
- Department of Biochemistry, Gene Center Ludwig‐Maximillians‐Universität München Feodor‐Lynen‐str 25 Munich Germany
- Max Planck Institute for Biophysical Chemistry Göttingen Germany
| | - Ulrike Gaul
- Department of Biochemistry, Gene Center Ludwig‐Maximillians‐Universität München Feodor‐Lynen‐str 25 Munich Germany
| |
Collapse
|
29
|
Moore JE, Zhang XO, Elhajjajy SI, Fan K, Pratt HE, Reese F, Mortazavi A, Weng Z. Integration of high-resolution promoter profiling assays reveals novel, cell type-specific transcription start sites across 115 human cell and tissue types. Genome Res 2021; 32:389-402. [PMID: 34949670 PMCID: PMC8805725 DOI: 10.1101/gr.275723.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 12/19/2021] [Indexed: 12/02/2022]
Abstract
Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated nor do they contain information on cell type–specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks are primarily proximal to GENCODE-annotated TSSs and are concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3′ ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations are supported by epigenomic and other transcriptomic data sets. To show the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI genome-wide association study (GWAS) catalog and identified new candidate GWAS genes. Overall, our work shows the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community.
Collapse
Affiliation(s)
| | | | | | - Kaili Fan
- University of Massachusetts Chan Medical School
| | | | | | | | - Zhiping Weng
- University of Massachusetts Chan Medical School;
| |
Collapse
|
30
|
Yokobayashi S, Yabuta Y, Nakagawa M, Okita K, Hu B, Murase Y, Nakamura T, Bourque G, Majewski J, Yamamoto T, Saitou M. Inherent genomic properties underlie the epigenomic heterogeneity of human induced pluripotent stem cells. Cell Rep 2021; 37:109909. [PMID: 34731633 DOI: 10.1016/j.celrep.2021.109909] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 07/24/2021] [Accepted: 10/08/2021] [Indexed: 01/13/2023] Open
Abstract
Human induced pluripotent stem cells (hiPSCs) show variable differentiation potential due to their epigenomic heterogeneity, whose extent/attributes remain unclear, except for well-studied elements/chromosomes such as imprints and the X chromosomes. Here, we show that seven hiPSC lines with variable germline potential exhibit substantial epigenomic heterogeneity, despite their uniform transcriptomes. Nearly a quarter of autosomal regions bear potentially differential chromatin modifications, with promoters/CpG islands for H3K27me3/H2AK119ub1 and evolutionarily young retrotransposons for H3K4me3. We identify 145 large autosomal blocks (≥100 kb) with differential H3K9me3 enrichment, many of which are lamina-associated domains (LADs) in somatic but not in embryonic stem cells. A majority of these epigenomic heterogeneities are independent of genetic variations. We identify an X chromosome state with chromosome-wide H3K9me3 that stably prevents X chromosome erosion. Importantly, the germline potential of female hiPSCs correlates with X chromosome inactivation. We propose that inherent genomic properties, including CpG density, transposons, and LADs, engender epigenomic heterogeneity in hiPSCs.
Collapse
|
31
|
Patel ZM, Hughes TR. Global properties of regulatory sequences are predicted by transcription factor recognition mechanisms. Genome Biol 2021; 22:285. [PMID: 34620190 PMCID: PMC8496038 DOI: 10.1186/s13059-021-02503-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 09/16/2021] [Indexed: 01/07/2023] Open
Abstract
Background Mammalian genomes contain millions of putative regulatory sequences, which are delineated by binding of multiple transcription factors. The degree to which spacing and orientation constraints among transcription factor binding sites contribute to the recognition and identity of regulatory sequence is an unresolved but important question that impacts our understanding of genome function and evolution. Global mechanisms that underlie phenomena including the size of regulatory sequences, their uniqueness, and their evolutionary turnover remain poorly described. Results Here, we ask whether models incorporating different degrees of spacing and orientation constraints among transcription factor binding sites are broadly consistent with several global properties of regulatory sequence. These properties include length, sequence diversity, turnover rate, and dominance of specific TFs in regulatory site identity and cell type specification. Models with and without spacing and orientation constraints are generally consistent with all observed properties of regulatory sequence, and with regulatory sequences being fundamentally small (~ 1 nucleosome). Uniqueness of regulatory regions and their rapid evolutionary turnover are expected under all models examined. An intriguing issue we identify is that the complexity of eukaryotic regulatory sites must scale with the number of active transcription factors, in order to accomplish observed specificity. Conclusions Models of transcription factor binding with or without spacing and orientation constraints predict that regulatory sequences should be fundamentally short, unique, and turn over rapidly. We posit that the existence of master regulators may be, in part, a consequence of evolutionary pressure to limit the complexity and increase evolvability of regulatory sites. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-021-02503-y.
Collapse
Affiliation(s)
- Zain M Patel
- Donnelly Centre for Cellular and Biomolecular Research and Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Timothy R Hughes
- Donnelly Centre for Cellular and Biomolecular Research and Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada.
| |
Collapse
|
32
|
Jores T, Tonnies J, Wrightsman T, Buckler ES, Cuperus JT, Fields S, Queitsch C. Synthetic promoter designs enabled by a comprehensive analysis of plant core promoters. Nat Plants 2021; 7:842-855. [PMID: 34083762 PMCID: PMC10246763 DOI: 10.1038/s41477-021-00932-y] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 04/27/2021] [Indexed: 05/24/2023]
Abstract
Targeted engineering of plant gene expression holds great promise for ensuring food security and for producing biopharmaceuticals in plants. However, this engineering requires thorough knowledge of cis-regulatory elements to precisely control either endogenous or introduced genes. To generate this knowledge, we used a massively parallel reporter assay to measure the activity of nearly complete sets of promoters from Arabidopsis, maize and sorghum. We demonstrate that core promoter elements-notably the TATA box-as well as promoter GC content and promoter-proximal transcription factor binding sites influence promoter strength. By performing the experiments in two assay systems, leaves of the dicot tobacco and protoplasts of the monocot maize, we detect species-specific differences in the contributions of GC content and transcription factors to promoter strength. Using these observations, we built computational models to predict promoter strength in both assay systems, allowing us to design highly active promoters comparable in activity to the viral 35S minimal promoter. Our results establish a promising experimental approach to optimize native promoter elements and generate synthetic ones with desirable features.
Collapse
Affiliation(s)
- Tobias Jores
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jackson Tonnies
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Graduate Program in Biology, University of Washington, Seattle, WA, USA
| | - Travis Wrightsman
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY, USA
| | - Edward S Buckler
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY, USA
- Agricultural Research Service, United States Department of Agriculture, Ithaca, NY, USA
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, USA
| | - Josh T Cuperus
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Stanley Fields
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Department of Medicine, University of Washington, Seattle, WA, USA.
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| |
Collapse
|
33
|
Agarwal V, Shendure J. Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks. Cell Rep 2021; 31:107663. [PMID: 32433972 DOI: 10.1016/j.celrep.2020.107663] [Citation(s) in RCA: 79] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 06/11/2019] [Accepted: 04/28/2020] [Indexed: 01/06/2023] Open
Abstract
Algorithms that accurately predict gene structure from primary sequence alone were transformative for annotating the human genome. Can we also predict the expression levels of genes based solely on genome sequence? Here, we sought to apply deep convolutional neural networks toward that goal. Surprisingly, a model that includes only promoter sequences and features associated with mRNA stability explains 59% and 71% of variation in steady-state mRNA levels in human and mouse, respectively. This model, termed Xpresso, more than doubles the accuracy of alternative sequence-based models and isolates rules as predictive as models relying on chromatic immunoprecipitation sequencing (ChIP-seq) data. Xpresso recapitulates genome-wide patterns of transcriptional activity, and its residuals can be used to quantify the influence of enhancers, heterochromatic domains, and microRNAs. Model interpretation reveals that promoter-proximal CpG dinucleotides strongly predict transcriptional activity. Looking forward, we propose cell-type-specific gene-expression predictions based solely on primary sequences as a grand challenge for the field.
Collapse
Affiliation(s)
- Vikram Agarwal
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Calico Life Sciences LLC, South San Francisco, CA 94080, USA.
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, Seattle, WA 98195, USA; Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
34
|
Letiagina AE, Omelina ES, Ivankin AV, Pindyurin AV. MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. Front Genet 2021; 12:618189. [PMID: 34046055 PMCID: PMC8148044 DOI: 10.3389/fgene.2021.618189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/25/2021] [Indexed: 11/13/2022] Open
Abstract
Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC-ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC-ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional "mapping" samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.
Collapse
Affiliation(s)
- Anna E Letiagina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Faculty of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Evgeniya S Omelina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anton V Ivankin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Alexey V Pindyurin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
35
|
Victorino J, Alvarez-Franco A, Manzanares M. Functional genomics and epigenomics of atrial fibrillation. J Mol Cell Cardiol 2021; 157:45-55. [PMID: 33887329 DOI: 10.1016/j.yjmcc.2021.04.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 04/07/2021] [Accepted: 04/12/2021] [Indexed: 02/06/2023]
Abstract
Atrial fibrillation is a progressive cardiac arrhythmia that increases the risk of hospitalization and adverse cardiovascular events. Despite years of study, we still do not have a full comprehension of the molecular mechanism responsible for the disease. The recent implementation of large-scale approaches in both patient samples, population studies and animal models has helped us to broaden our knowledge on the molecular drivers responsible for AF and on the mechanisms behind disease progression. Understanding genomic and epigenomic changes that take place during chronification of AF will prove essential to design novel treatments leading to improved patient care.
Collapse
Affiliation(s)
- Jesus Victorino
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain; Departamento de Bioquímica, Facultad de Medicina, Universidad Autónoma de Madrid (UAM), Spain
| | - Alba Alvarez-Franco
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Miguel Manzanares
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain; Centro de Biología Molecular Severo Ochoa, CSIC-UAM, Madrid, Spain.
| |
Collapse
|
36
|
Mayayo-Peralta I, Prekovic S, Zwart W. Estrogen Receptor on the move: Cistromic plasticity and its implications in breast cancer. Mol Aspects Med 2021; 78:100939. [PMID: 33358533 DOI: 10.1016/j.mam.2020.100939] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 12/08/2020] [Accepted: 12/10/2020] [Indexed: 01/27/2023]
Abstract
Estrogen Receptor (ERα) is a hormone-driven transcription factor, critically involved in driving tumor cell proliferation in the vast majority of breast cancers (BCas). ERα binds the genome at cis-regulatory elements, dictating the expression of a large spectrum of responsive genes in 3D genomic space. While initial reports described a rather static ERα chromatin binding repertoire, we now know that ERα DNA interactions are highly versatile, altered in breast tumor development and progression, and deviate between tumors from patients with differential outcome. Multiple cellular signaling cascades are known to impinge on ERα genomic function, changing its cistrome to retarget the receptor to other regions of the genome and reprogram its impact on breast cell biology. This review describes the current state-of-the-art on which factors manipulate the ERα cistrome and how this alters the response to both endogenous and exogenous hormonal stimuli, ultimately impacting BCa cell progression and response to commonly used therapeutic interventions. Novel insights in ERα cistrome dynamics may pave the way for better patient diagnostics and the development of novel therapeutic interventions, ultimately improving cancer care and patient outcome.
Collapse
|
37
|
Redondo-Antón J, Fontela MG, Notario L, Torres-Ruiz R, Rodríguez-Perales S, Lorente E, Lauzurica P. Functional Characterization of a Dual Enhancer/Promoter Regulatory Element Leading Human CD69 Expression. Front Genet 2020; 11:552949. [PMID: 33193627 PMCID: PMC7652794 DOI: 10.3389/fgene.2020.552949] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 10/07/2020] [Indexed: 11/29/2022] Open
Abstract
The CD69 gene encodes a C-type lectin glycoprotein with immune regulatory properties which is expressed on the cell surfaces of all activated hematopoietic cells. CD69 activation kinetics differ by developmental stage, cell linage and activating conditions, and these differences have been attributed to the participation of complex gene regulatory networks. An evolutionarily conserved regulatory element, CNS2, located 4kb upstream of the CD69 gene transcriptional start site, has been proposed as the major candidate governing the gene transcriptional activation program. To investigate the function of human CNS2, we studied the effect of its endogenous elimination via CRISPR-Cas9 on CD69 protein and mRNA expression levels in various immune cell lines. Even when the entire promoter region was maintained, CNS2-/- cells did not express CD69, thus indicating that CNS2 has promoter-like characteristics. However, like enhancers, inverted CNS2 sustained transcription, although at a diminished levels, thereby suggesting that it has dual promoter and enhancer functions. Episomal luciferase assays further suggested that both functions are combined within the CNS2 regulatory element. In addition, CNS2 directs its own bidirectional transcription into two different enhancer-derived RNAs molecules (eRNAs) which are transcribed from two independent transcriptional start sites in opposite directions. This eRNA transcription is dependent on only the enhancer sequence itself, because in the absence of the CD69 promoter, sufficient RNA polymerase II levels are maintained at CNS2 to drive eRNA expression. Here, we describe a regulatory element with overlapping promoter and enhancer functions, which is essential for CD69 gene transcriptional regulation.
Collapse
Affiliation(s)
- Jennifer Redondo-Antón
- Immune Gene Regulation and Antigen Presentation Group, National Center for Microbiology, Institute of Health Carlos III (ISCIII), Madrid, Spain
| | - M G Fontela
- Immune Gene Regulation and Antigen Presentation Group, National Center for Microbiology, Institute of Health Carlos III (ISCIII), Madrid, Spain
| | - Laura Notario
- Immune Gene Regulation and Antigen Presentation Group, National Center for Microbiology, Institute of Health Carlos III (ISCIII), Madrid, Spain
| | - Raúl Torres-Ruiz
- Molecular Cytogenetics and Genome Editing Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Sandra Rodríguez-Perales
- Molecular Cytogenetics and Genome Editing Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Elena Lorente
- Immune Gene Regulation and Antigen Presentation Group, National Center for Microbiology, Institute of Health Carlos III (ISCIII), Madrid, Spain
| | - Pilar Lauzurica
- Immune Gene Regulation and Antigen Presentation Group, National Center for Microbiology, Institute of Health Carlos III (ISCIII), Madrid, Spain
| |
Collapse
|
38
|
Klein JC, Agarwal V, Inoue F, Keith A, Martin B, Kircher M, Ahituv N, Shendure J. A systematic evaluation of the design and context dependencies of massively parallel reporter assays. Nat Methods 2020; 17:1083-1091. [PMID: 33046894 PMCID: PMC7727316 DOI: 10.1038/s41592-020-0965-y] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 08/27/2020] [Indexed: 01/02/2023]
Abstract
Massively parallel reporter assays (MPRAs) functionally screen thousands of sequences for regulatory activity in parallel. To date, there are limited studies that systematically compare differences in MPRA design. Here, we screen a library of 2,440 candidate liver enhancers and controls for regulatory activity in HepG2 cells using nine different MPRA designs. We identify subtle but significant differences that correlate with epigenetic and sequence-level features, as well as differences in dynamic range and reproducibility. We also validate that enhancer activity is largely independent of orientation, at least for our library and designs. Finally, we assemble and test the same enhancers as 192-mers, 354-mers and 678-mers and observe sizable differences. This work provides a framework for the experimental design of high-throughput reporter assays, suggesting that the extended sequence context of tested elements and to a lesser degree the precise assay, influence MPRA results.
Collapse
Affiliation(s)
- Jason C Klein
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Vikram Agarwal
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Aidan Keith
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Beth Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Martin Kircher
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Berlin Institute of Health (BIH), Berlin, Germany
- Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
39
|
Vo Ngoc L, Huang CY, Cassidy CJ, Medrano C, Kadonaga JT. Identification of the human DPR core promoter element using machine learning. Nature 2020; 585:459-63. [PMID: 32908305 DOI: 10.1038/s41586-020-2689-7] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 06/16/2020] [Indexed: 01/31/2023]
Abstract
The RNA polymerase II (Pol II) core promoter is the strategic site of convergence of the signals that lead to transcription initiation1-5, but the downstream core promoter in humans has been difficult to decipher1-3. Here, we analyze the human Pol II core promoter and use machine learning to generate predictive models for the downstream core promoter region (DPR) and the TATA box. We developed a method termed HARPE (high-throughput analysis of randomized promoter elements) to create hundreds of thousands of DPR (or TATA box) variants that are each of known transcriptional strength. We then analyzed the HARPE data by support vector regression (SVR) to provide comprehensive models for the sequence motifs, and found that the SVR-based approach is more effective than a consensus-based method for predicting transcriptional activity. These studies revealed that the DPR is a functionally important core promoter element that is widely used in human promoters. Importantly, there appears to be a duality between the DPR and TATA box, as many promoters contain one or the other element. More broadly, these findings show that functional DNA motifs can be identified by machine learning analysis of a comprehensive set of sequence variants.
Collapse
|
40
|
Ntini E, Marsico A. Functional impacts of non-coding RNA processing on enhancer activity and target gene expression. J Mol Cell Biol 2020; 11:868-879. [PMID: 31169884 PMCID: PMC6884709 DOI: 10.1093/jmcb/mjz047] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 04/03/2019] [Accepted: 04/04/2019] [Indexed: 01/06/2023] Open
Abstract
Tight regulation of gene expression is orchestrated by enhancers. Through recent research advancements, it is becoming clear that enhancers are not solely distal regulatory elements harboring transcription factor binding sites and decorated with specific histone marks, but they rather display signatures of active transcription, showing distinct degrees of transcription unit organization. Thereby, a substantial fraction of enhancers give rise to different species of non-coding RNA transcripts with an unprecedented range of potential functions. In this review, we bring together data from recent studies indicating that non-coding RNA transcription from active enhancers, as well as enhancer-produced long non-coding RNA transcripts, may modulate or define the functional regulatory potential of the cognate enhancer. In addition, we summarize supporting evidence that RNA processing of the enhancer-associated long non-coding RNA transcripts may constitute an additional layer of regulation of enhancer activity, which contributes to the control and final outcome of enhancer-targeted gene expression.
Collapse
Affiliation(s)
- Evgenia Ntini
- Max Planck Institute for Molecular Genetics, Berlin, Germany.,Free University Berlin, Berlin, Germany
| | - Annalisa Marsico
- Max Planck Institute for Molecular Genetics, Berlin, Germany.,Free University Berlin, Berlin, Germany.,Institute of Computational Biology, Helmholtz Zentrum München, München, Germany
| |
Collapse
|
41
|
Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, Kawli T, Davis CA, Dobin A, Kaul R, Halow J, Van Nostrand EL, Freese P, Gorkin DU, Shen Y, He Y, Mackiewicz M, Pauli-Behn F, Williams BA, Mortazavi A, Keller CA, Zhang XO, Elhajjajy SI, Huey J, Dickel DE, Snetkova V, Wei X, Wang X, Rivera-Mulia JC, Rozowsky J, Zhang J, Chhetri SB, Zhang J, Victorsen A, White KP, Visel A, Yeo GW, Burge CB, Lécuyer E, Gilbert DM, Dekker J, Rinn J, Mendenhall EM, Ecker JR, Kellis M, Klein RJ, Noble WS, Kundaje A, Guigó R, Farnham PJ, Cherry JM, Myers RM, Ren B, Graveley BR, Gerstein MB, Pennacchio LA, Snyder MP, Bernstein BE, Wold B, Hardison RC, Gingeras TR, Stamatoyannopoulos JA, Weng Z. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 2020; 583:699-710. [PMID: 32728249 PMCID: PMC7410828 DOI: 10.1038/s41586-020-2493-4] [Citation(s) in RCA: 879] [Impact Index Per Article: 219.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2017] [Accepted: 05/27/2020] [Indexed: 12/13/2022]
Abstract
The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.
Collapse
Affiliation(s)
- Jill E Moore
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Michael J Purcaro
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Henry E Pratt
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | | | - Noam Shoresh
- The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Jessika Adrian
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Trupti Kawli
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Carrie A Davis
- Cold Spring Harbor Laboratory, Functional Genomics, Cold Spring Harbor, NY, USA
| | - Alexander Dobin
- Cold Spring Harbor Laboratory, Functional Genomics, Cold Spring Harbor, NY, USA
| | - Rajinder Kaul
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
- Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA
| | - Jessica Halow
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Eric L Van Nostrand
- Department of Cellular and Molecular Medicine, Institute for Genomic Medicine, Stem Cell Program, Sanford Consortium for Regenerative Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Peter Freese
- Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - David U Gorkin
- Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Yin Shen
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
- Institute for Human Genetics, Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Yupeng He
- Genomics Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Mark Mackiewicz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | | | - Brian A Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California Irvine, Irvine, CA, USA
| | - Cheryl A Keller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | - Xiao-Ou Zhang
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Shaimae I Elhajjajy
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Jack Huey
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Diane E Dickel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Valentina Snetkova
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Xintao Wei
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, UConn Health, Farmington, CT, USA
| | - Xiaofeng Wang
- Département de Biochimie et Médecine Moléculaire, Université de Montréal, Montréal, Quebec, Canada
- Division of Experimental Medicine, McGill University, Montreal, Quebec, Canada
- Institut de Recherches Cliniques de Montréal (IRCM), Montréal, Quebec, Canada
| | - Juan Carlos Rivera-Mulia
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Medical School, Minneapolis, MN, USA
| | | | | | - Surya B Chhetri
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
- Biological Sciences, University of Alabama in Huntsville, Huntsville, AL, USA
| | - Jialing Zhang
- Department of Genetics, School of Medicine, Yale University, New Haven, CT, USA
| | - Alec Victorsen
- Department of Human Genetics, Institute for Genomics and Systems Biology, The University of Chicago, Chicago, IL, USA
| | | | - Axel Visel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- School of Natural Sciences, University of California, Merced, Merced, CA, USA
| | - Gene W Yeo
- Department of Cellular and Molecular Medicine, Institute for Genomic Medicine, Stem Cell Program, Sanford Consortium for Regenerative Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Christopher B Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Eric Lécuyer
- Département de Biochimie et Médecine Moléculaire, Université de Montréal, Montréal, Quebec, Canada
- Division of Experimental Medicine, McGill University, Montreal, Quebec, Canada
- Institut de Recherches Cliniques de Montréal (IRCM), Montréal, Quebec, Canada
| | - David M Gilbert
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Job Dekker
- HHMI and Program in Systems Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - John Rinn
- University of Colorado Boulder, Boulder, CO, USA
| | - Eric M Mendenhall
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
- Biological Sciences, University of Alabama in Huntsville, Huntsville, AL, USA
| | - Joseph R Ecker
- Genomics Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
- Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Manolis Kellis
- The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Robert J Klein
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Anshul Kundaje
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Roderic Guigó
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology and Universitat Pompeu Fabra, Barcelona, Spain
| | - Peggy J Farnham
- Department of Biochemistry and Molecular Medicine, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - J Michael Cherry
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA.
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA.
| | - Bing Ren
- Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA.
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA.
| | - Brenton R Graveley
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, UConn Health, Farmington, CT, USA.
| | | | - Len A Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- Comparative Biochemistry Program, University of California, Berkeley, CA, USA.
| | - Michael P Snyder
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA.
- Cardiovascular Institute, Stanford School of Medicine, Stanford, CA, USA.
| | - Bradley E Bernstein
- Broad Institute and Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA.
| | - Thomas R Gingeras
- Cold Spring Harbor Laboratory, Functional Genomics, Cold Spring Harbor, NY, USA.
| | - John A Stamatoyannopoulos
- Altius Institute for Biomedical Sciences, Seattle, WA, USA.
- Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA.
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
| | - Zhiping Weng
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA.
- Department of Thoracic Surgery, Clinical Translational Research Center, Shanghai Pulmonary Hospital, The School of Life Sciences and Technology, Tongji University, Shanghai, China.
- Bioinformatics Program, Boston University, Boston, MA, USA.
| |
Collapse
|
42
|
Morgan RA, Ma F, Unti MJ, Brown D, Ayoub PG, Tam C, Lathrop L, Aleshe B, Kurita R, Nakamura Y, Senadheera S, Wong RL, Hollis RP, Pellegrini M, Kohn DB. Creating New β-Globin-Expressing Lentiviral Vectors by High-Resolution Mapping of Locus Control Region Enhancer Sequences. Mol Ther Methods Clin Dev 2020; 17:999-1013. [PMID: 32426415 PMCID: PMC7225380 DOI: 10.1016/j.omtm.2020.04.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 04/13/2020] [Indexed: 12/18/2022]
Abstract
Hematopoietic stem cell gene therapy is a promising approach for treating disorders of the hematopoietic system. Identifying combinations of cis-regulatory elements that do not impede packaging or transduction efficiency when included in lentiviral vectors has proven challenging. In this study, we deploy LV-MPRA (lentiviral vector-based, massively parallel reporter assay), an approach that simultaneously analyzes thousands of synthetic DNA fragments in parallel to identify sequence-intrinsic and lineage-specific enhancer function at near-base-pair resolution. We demonstrate the power of LV-MPRA in elucidating the boundaries of previously unknown intrinsic enhancer sequences of the human β-globin locus control region. Our approach facilitated the rapid assembly of novel therapeutic βAS3-globin lentiviral vectors harboring strong lineage-specific recombinant control elements capable of correcting a mouse model of sickle cell disease. LV-MPRA can be used to map any genomic locus for enhancer activity and facilitates the rapid development of therapeutic vectors for treating disorders of the hematopoietic system or other specific tissues and cell types.
Collapse
Affiliation(s)
- Richard A. Morgan
- Charles R. Drew University of Medicine and Science, Los Angeles, CA 90059, USA
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Feiyang Ma
- Molecular Biology Institute Interdepartmental Doctoral Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Mildred J. Unti
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Devin Brown
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Paul George Ayoub
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Curtis Tam
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Lindsay Lathrop
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Bamidele Aleshe
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Ryo Kurita
- Cell Engineering Division, RIKEN BioResource Center, Tsukuba, Ibaraki, Japan
| | - Yukio Nakamura
- Cell Engineering Division, RIKEN BioResource Center, Tsukuba, Ibaraki, Japan
| | - Shantha Senadheera
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Ryan L. Wong
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Roger P. Hollis
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Matteo Pellegrini
- Molecular Biology Institute Interdepartmental Doctoral Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Donald B. Kohn
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Pediatrics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- The Eli & Edythe Broad Center of Regenerative Medicine & Stem Cell Research, University of California, Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
43
|
Abstract
The human gene catalogue is essentially complete, but we lack an equivalently vetted inventory of bona fide human enhancers. Hundreds of thousands of candidate enhancers have been nominated via biochemical annotations; however, only a handful of these have been validated and confidently linked to their target genes. Here we review emerging technologies for discovering, characterizing and validating human enhancers at scale. We furthermore propose a new framework for operationally defining enhancers that accommodates the heterogeneous and complementary results that are emerging from reporter assays, biochemical measurements and CRISPR screens.
Collapse
Affiliation(s)
- Molly Gasperini
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jacob M Tome
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
44
|
Ohnmacht J, May P, Sinkkonen L, Krüger R. Missing heritability in Parkinson's disease: the emerging role of non-coding genetic variation. J Neural Transm (Vienna) 2020; 127:729-748. [PMID: 32248367 PMCID: PMC7242266 DOI: 10.1007/s00702-020-02184-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 03/24/2020] [Indexed: 02/01/2023]
Abstract
Parkinson’s disease (PD) is a neurodegenerative disorder caused by a complex interplay of genetic and environmental factors. For the stratification of PD patients and the development of advanced clinical trials, including causative treatments, a better understanding of the underlying genetic architecture of PD is required. Despite substantial efforts, genome-wide association studies have not been able to explain most of the observed heritability. The majority of PD-associated genetic variants are located in non-coding regions of the genome. A systematic assessment of their functional role is hampered by our incomplete understanding of genotype–phenotype correlations, for example through differential regulation of gene expression. Here, the recent progress and remaining challenges for the elucidation of the role of non-coding genetic variants is reviewed with a focus on PD as a complex disease with multifactorial origins. The function of gene regulatory elements and the impact of non-coding variants on them, and the means to map these elements on a genome-wide level, will be delineated. Moreover, examples of how the integration of functional genomic annotations can serve to identify disease-associated pathways and to prioritize disease- and cell type-specific regulatory variants will be given. Finally, strategies for functional validation and considerations for suitable model systems are outlined. Together this emphasizes the contribution of rare and common genetic variants to the complex pathogenesis of PD and points to remaining challenges for the dissection of genetic complexity that may allow for better stratification, improved diagnostics and more targeted treatments for PD in the future.
Collapse
Affiliation(s)
- Jochen Ohnmacht
- LCSB, University of Luxembourg, Belvaux, Luxembourg.,Department of Life Sciences and Medicine (DLSM), University of Luxembourg, Belvaux, Luxembourg
| | - Patrick May
- LCSB, University of Luxembourg, Belvaux, Luxembourg
| | - Lasse Sinkkonen
- Department of Life Sciences and Medicine (DLSM), University of Luxembourg, Belvaux, Luxembourg
| | - Rejko Krüger
- LCSB, University of Luxembourg, Belvaux, Luxembourg. .,Luxembourg Institute of Health (LIH), Transversal Translational Medicine, Strassen, Luxembourg. .,Parkinson Research Clinic, Centre Hospitalier de Luxembourg (CHL), Luxembourg, Luxembourg.
| |
Collapse
|
45
|
Abstract
The majority of the human genome does not encode proteins. Many of these noncoding regions contain important regulatory sequences that control gene expression. To date, most studies have focused on activators such as enhancers, but regions that repress gene expression-silencers-have not been systematically studied. We have developed a system that identifies silencer regions in a genome-wide fashion on the basis of silencer-mediated transcriptional repression of caspase 9. We found that silencers are widely distributed and may function in a tissue-specific fashion. These silencers harbor unique epigenetic signatures and are associated with specific transcription factors. Silencers also act at multiple genes, and at the level of chromosomal domains and long-range interactions. Deletion of silencer regions linked to the drug transporter genes ABCC2 and ABCG2 caused chemo-resistance. Overall, our study demonstrates that tissue-specific silencing is widespread throughout the human genome and probably contributes substantially to the regulation of gene expression and human biology.
Collapse
Affiliation(s)
- Baoxu Pang
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | | |
Collapse
|
46
|
Broekema RV, Bakker OB, Jonkers IH. A practical view of fine-mapping and gene prioritization in the post-genome-wide association era. Open Biol 2020; 10:190221. [PMID: 31937202 PMCID: PMC7014684 DOI: 10.1098/rsob.190221] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 12/05/2019] [Indexed: 12/17/2022] Open
Abstract
Over the past 15 years, genome-wide association studies (GWASs) have enabled the systematic identification of genetic loci associated with traits and diseases. However, due to resolution issues and methodological limitations, the true causal variants and genes associated with traits remain difficult to identify. In this post-GWAS era, many biological and computational fine-mapping approaches now aim to solve these issues. Here, we review fine-mapping and gene prioritization approaches that, when combined, will improve the understanding of the underlying mechanisms of complex traits and diseases. Fine-mapping of genetic variants has become increasingly sophisticated: initially, variants were simply overlapped with functional elements, but now the impact of variants on regulatory activity and direct variant-gene 3D interactions can be identified. Moreover, gene manipulation by CRISPR/Cas9, the identification of expression quantitative trait loci and the use of co-expression networks have all increased our understanding of the genes and pathways affected by GWAS loci. However, despite this progress, limitations including the lack of cell-type- and disease-specific data and the ever-increasing complexity of polygenic models of traits pose serious challenges. Indeed, the combination of fine-mapping and gene prioritization by statistical, functional and population-based strategies will be necessary to truly understand how GWAS loci contribute to complex traits and diseases.
Collapse
Affiliation(s)
| | | | - I. H. Jonkers
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
47
|
Abstract
Monkeys are a premier model organism for neuroscience research. Activity in the central nervous systems of monkeys can be recorded and manipulated while they perform complex perceptual, motor, or cognitive tasks. Conventional techniques for manipulating neural activity in monkeys are too coarse to address many of the outstanding questions in primate neuroscience, but optogenetics holds the promise to overcome this hurdle. In this article, we review the progress that has been made in primate optogenetics over the past 5 years. We emphasize the use of gene regulatory sequences in viral vectors to target specific neuronal types, and we present data on vectors that we engineered to target parvalbumin-expressing neurons. We conclude with a discussion of the utility of optogenetics for treating sensorimotor hearing loss and Parkinson's disease, areas of translational neuroscience in which monkeys provide unique leverage for basic science and medicine.
Collapse
|
48
|
de Boer CG, Vaishnav ED, Sadeh R, Abeyta EL, Friedman N, Regev A. Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat Biotechnol 2020; 38:56-65. [PMID: 31792407 DOI: 10.1038/s41587-019-0315-8] [Citation(s) in RCA: 117] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 10/16/2019] [Indexed: 11/26/2022]
Abstract
How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF’s specificity, activity, and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation. Gene expression levels in yeast are predicted using a massive dataset on promoters with random sequences.
Collapse
|
49
|
|
50
|
Perenthaler E, Yousefi S, Niggl E, Barakat TS. Beyond the Exome: The Non-coding Genome and Enhancers in Neurodevelopmental Disorders and Malformations of Cortical Development. Front Cell Neurosci 2019; 13:352. [PMID: 31417368 PMCID: PMC6685065 DOI: 10.3389/fncel.2019.00352] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 07/16/2019] [Indexed: 12/22/2022] Open
Abstract
The development of the human cerebral cortex is a complex and dynamic process, in which neural stem cell proliferation, neuronal migration, and post-migratory neuronal organization need to occur in a well-organized fashion. Alterations at any of these crucial stages can result in malformations of cortical development (MCDs), a group of genetically heterogeneous neurodevelopmental disorders that present with developmental delay, intellectual disability and epilepsy. Recent progress in genetic technologies, such as next generation sequencing, most often focusing on all protein-coding exons (e.g., whole exome sequencing), allowed the discovery of more than a 100 genes associated with various types of MCDs. Although this has considerably increased the diagnostic yield, most MCD cases remain unexplained. As Whole Exome Sequencing investigates only a minor part of the human genome (1–2%), it is likely that patients, in which no disease-causing mutation has been identified, could harbor mutations in genomic regions beyond the exome. Even though functional annotation of non-coding regions is still lagging behind that of protein-coding genes, tremendous progress has been made in the field of gene regulation. One group of non-coding regulatory regions are enhancers, which can be distantly located upstream or downstream of genes and which can mediate temporal and tissue-specific transcriptional control via long-distance interactions with promoter regions. Although some examples exist in literature that link alterations of enhancers to genetic disorders, a widespread appreciation of the putative roles of these sequences in MCDs is still lacking. Here, we summarize the current state of knowledge on cis-regulatory regions and discuss novel technologies such as massively-parallel reporter assay systems, CRISPR-Cas9-based screens and computational approaches that help to further elucidate the emerging role of the non-coding genome in disease. Moreover, we discuss existing literature on mutations or copy number alterations of regulatory regions involved in brain development. We foresee that the future implementation of the knowledge obtained through ongoing gene regulation studies will benefit patients and will provide an explanation to part of the missing heritability of MCDs and other genetic disorders.
Collapse
Affiliation(s)
- Elena Perenthaler
- Department of Clinical Genetics, Erasmus MC - University Medical Center, Rotterdam, Netherlands
| | - Soheil Yousefi
- Department of Clinical Genetics, Erasmus MC - University Medical Center, Rotterdam, Netherlands
| | - Eva Niggl
- Department of Clinical Genetics, Erasmus MC - University Medical Center, Rotterdam, Netherlands
| | - Tahsin Stefan Barakat
- Department of Clinical Genetics, Erasmus MC - University Medical Center, Rotterdam, Netherlands
| |
Collapse
|