1
|
Barbadilla-Martínez L, Klaassen N, van Steensel B, de Ridder J. Predicting gene expression from DNA sequence using deep learning models. Nat Rev Genet 2025:10.1038/s41576-025-00841-2. [PMID: 40360798 DOI: 10.1038/s41576-025-00841-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/01/2025] [Indexed: 05/15/2025]
Abstract
Transcription of genes is regulated by DNA elements such as promoters and enhancers, the activity of which are in turn controlled by many transcription factors. Owing to the highly complex combinatorial logic involved, it has been difficult to construct computational models that predict gene activity from DNA sequence. Recent advances in deep learning techniques applied to data from epigenome mapping and high-throughput reporter assays have made substantial progress towards addressing this complexity. Such models can capture the regulatory grammar with remarkable accuracy and show great promise in predicting the effects of non-coding variants, uncovering detailed molecular mechanisms of gene regulation and designing synthetic regulatory elements for biotechnology. Here, we discuss the principles of these approaches, the types of training data sets that are available and the strengths and limitations of different approaches.
Collapse
Affiliation(s)
- Lucía Barbadilla-Martínez
- Oncode Institute, Utrecht, The Netherlands
- Center for Molecular Medicine, UMC Utrecht, Utrecht, The Netherlands
| | - Noud Klaassen
- Oncode Institute, Utrecht, The Netherlands
- Division of Molecular Genetics, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Bas van Steensel
- Oncode Institute, Utrecht, The Netherlands.
- Division of Molecular Genetics, Netherlands Cancer Institute, Amsterdam, The Netherlands.
| | - Jeroen de Ridder
- Oncode Institute, Utrecht, The Netherlands.
- Center for Molecular Medicine, UMC Utrecht, Utrecht, The Netherlands.
| |
Collapse
|
2
|
Zhang K, Wang Y, Jiang S, Li Y, Xiang P, Zhang Y, Chen Y, Chen M, Su W, Liu L, Li S. dsDAP: An efficient method for high-abundance DNA-encoded library construction in mammalian cells. Int J Biol Macromol 2025; 298:140089. [PMID: 39842606 DOI: 10.1016/j.ijbiomac.2025.140089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 01/14/2025] [Accepted: 01/17/2025] [Indexed: 01/24/2025]
Abstract
DNA-encoded libraries are invaluable tools for high-throughput screening and functional genomics studies. However, constructing high-abundance libraries in mammalian cells remains challenging. Here, we present dsDNA-assembly-PCR (dsDAP), a novel Gibson-assembly-PCR strategy for creating DNA-encoded libraries, offering improved flexibility and efficiency over previous methods. We demonstrated this approach by investigating the impact of translation initiation sequences (TIS) on protein expression in HEK293T cells. Both CRISPR-Cas9 and piggyBac systems were employed for genomic integration, allowing comparison of different integration methods. Our results confirmed the importance of specific nucleotides in the TIS region, particularly the preference for adenine at the -3 position in high-expression sequences. We also explored the effects of library dilution on genotype-phenotype correlations. This Gibson-assembly-PCR strategy overcomes limitations of existing methods, such as restriction enzyme dependencies, and provides a versatile tool for constructing high-abundance libraries in mammalian cells. Our approach has broad applications in functional genomics, drug discovery, and the study of gene regulation.
Collapse
Affiliation(s)
- Kaili Zhang
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Yi Wang
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Shuze Jiang
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Yifan Li
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Pan Xiang
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Yuxuan Zhang
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Yongzi Chen
- Department of Tumor Cell Biology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Min Chen
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Weijun Su
- School of Medicine, Nankai University, Tianjin 300071, China
| | - Liren Liu
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China.
| | - Shuai Li
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China.
| |
Collapse
|
3
|
Baniulyte G, McCann AA, Woodstock DL, Sammons MA. Crosstalk between paralogs and isoforms influences p63-dependent regulatory element activity. Nucleic Acids Res 2024; 52:13812-13831. [PMID: 39565223 DOI: 10.1093/nar/gkae1143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 10/04/2024] [Accepted: 11/01/2024] [Indexed: 11/21/2024] Open
Abstract
The p53 family of transcription factors (p53, p63 and p73) regulate diverse organismal processes including tumor suppression, maintenance of genome integrity and the development of skin and limbs. Crosstalk between transcription factors with highly similar DNA binding profiles, like those in the p53 family, can dramatically alter gene regulation. While p53 is primarily associated with transcriptional activation, p63 mediates both activation and repression. The specific mechanisms controlling p63-dependent gene regulatory activity are not well understood. Here, we use massively parallel reporter assays (MPRA) to investigate how local DNA sequence context influences p63-dependent transcriptional activity. Most regulatory elements with a p63 response element motif (p63RE) activate transcription, although binding of the p63 paralog, p53, drives a substantial proportion of that activity. p63RE sequence content and co-enrichment with other known activating and repressing transcription factors, including lineage-specific factors, correlates with differential p63RE-mediated activities. p63 isoforms dramatically alter transcriptional behavior, primarily shifting inactive regulatory elements towards high p63-dependent activity. Our analysis provides novel insight into how local sequence and cellular context influences p63-dependent behaviors and highlights the key, yet still understudied, role of transcription factor paralogs and isoforms in controlling gene regulatory element activity.
Collapse
Affiliation(s)
- Gabriele Baniulyte
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, 1400 Washington Ave, Albany, NY 12222, USA
| | - Abby A McCann
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, 1400 Washington Ave, Albany, NY 12222, USA
| | - Dana L Woodstock
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, 1400 Washington Ave, Albany, NY 12222, USA
| | - Morgan A Sammons
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, 1400 Washington Ave, Albany, NY 12222, USA
| |
Collapse
|
4
|
Korbel F, Eroshok E, Ohler U. Interpreting deep neural networks for the prediction of translation rates. BMC Genomics 2024; 25:1061. [PMID: 39522049 PMCID: PMC11549864 DOI: 10.1186/s12864-024-10925-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024] Open
Abstract
BACKGROUND The 5' untranslated region of mRNA strongly impacts the rate of translation initiation. A recent convolutional neural network (CNN) model accurately quantifies the relationship between massively parallel synthetic 5' untranslated regions (5'UTRs) and translation levels. However, the underlying biological features, which drive model predictions, remain elusive. Uncovering sequence determinants predictive of translation output may allow us to develop a more detailed understanding of translation regulation at the 5'UTR. RESULTS Applying model interpretation, we extract representations of regulatory logic from CNNs trained on synthetic and human 5'UTR reporter data. We reveal a complex interplay of regulatory sequence elements, such as initiation context and upstream open reading frames (uORFs) to influence model predictions. We show that models trained on synthetic data alone do not sufficiently explain translation regulation via the 5'UTR due to differences in the frequency of regulatory motifs compared to natural 5'UTRs. CONCLUSIONS Our study demonstrates the significance of model interpretation in understanding model behavior, properties of experimental data and ultimately mRNA translation. By combining synthetic and human 5'UTR reporter data, we develop a model (OptMRL) which better captures the characteristics of human translation regulation. This approach provides a general strategy for building more successful sequence-based models of gene regulation, as it combines global sampling of random sequences with the subspace of naturally occurring sequences. Ultimately, this will enhance our understanding of 5'UTR sequences in disease and our ability to engineer translation output.
Collapse
Affiliation(s)
- Frederick Korbel
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Hannoversche Straße 28, Berlin, 10115, Germany
| | - Ekaterina Eroshok
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Hannoversche Straße 28, Berlin, 10115, Germany
- Department of Biology, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin, 10099, Germany
| | - Uwe Ohler
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Hannoversche Straße 28, Berlin, 10115, Germany.
- Department of Biology, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin, 10099, Germany.
- Department of Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin, 10099, Germany.
| |
Collapse
|
5
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
6
|
FitzPatrick VD, Leemans C, van Arensbergen J, van Steensel B, Bussemaker H. Defining the fine structure of promoter activity on a genome-wide scale with CISSECTOR. Nucleic Acids Res 2023; 51:5499-5511. [PMID: 37013986 PMCID: PMC10287907 DOI: 10.1093/nar/gkad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 03/08/2023] [Accepted: 03/22/2023] [Indexed: 04/05/2023] Open
Abstract
Classic promoter mutagenesis strategies can be used to study how proximal promoter regions regulate the expression of particular genes of interest. This is a laborious process, in which the smallest sub-region of the promoter still capable of recapitulating expression in an ectopic setting is first identified, followed by targeted mutation of putative transcription factor binding sites. Massively parallel reporter assays such as survey of regulatory elements (SuRE) provide an alternative way to study millions of promoter fragments in parallel. Here we show how a generalized linear model (GLM) can be used to transform genome-scale SuRE data into a high-resolution genomic track that quantifies the contribution of local sequence to promoter activity. This coefficient track helps identify regulatory elements and can be used to predict promoter activity of any sub-region in the genome. It thus allows in silico dissection of any promoter in the human genome to be performed. We developed a web application, available at cissector.nki.nl, that lets researchers easily perform this analysis as a starting point for their research into any promoter of interest.
Collapse
Affiliation(s)
- Vincent D FitzPatrick
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Christ Leemans
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Joris van Arensbergen
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Bas van Steensel
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
- Department of Cell Biology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
7
|
Stikker BS, Hendriks RW, Stadhouders R. Decoding the genetic and epigenetic basis of asthma. Allergy 2023; 78:940-956. [PMID: 36727912 DOI: 10.1111/all.15666] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 01/17/2023] [Accepted: 01/30/2023] [Indexed: 02/03/2023]
Abstract
Asthma is a complex and heterogeneous chronic inflammatory disease of the airways. Alongside environmental factors, asthma susceptibility is strongly influenced by genetics. Given its high prevalence and our incomplete understanding of the mechanisms underlying disease susceptibility, asthma is frequently studied in genome-wide association studies (GWAS), which have identified thousands of genetic variants associated with asthma development. Virtually all these genetic variants reside in non-coding genomic regions, which has obscured the functional impact of asthma-associated variants and their translation into disease-relevant mechanisms. Recent advances in genomics technology and epigenetics now offer methods to link genetic variants to gene regulatory elements embedded within non-coding regions, which have started to unravel the molecular mechanisms underlying the complex (epi)genetics of asthma. Here, we provide an integrated overview of (epi)genetic variants associated with asthma, focusing on efforts to link these disease associations to biological insight into asthma pathophysiology using state-of-the-art genomics methodology. Finally, we provide a perspective as to how decoding the genetic and epigenetic basis of asthma has the potential to transform clinical management of asthma and to predict the risk of asthma development.
Collapse
Affiliation(s)
- Bernard S Stikker
- Department of Pulmonary Medicine, Erasmus MC, University Medical Center, Rotterdam, The Netherlands
| | - Rudi W Hendriks
- Department of Pulmonary Medicine, Erasmus MC, University Medical Center, Rotterdam, The Netherlands
| | - Ralph Stadhouders
- Department of Pulmonary Medicine, Erasmus MC, University Medical Center, Rotterdam, The Netherlands.,Department of Cell Biology, Erasmus MC, University Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
8
|
Kashkin KN, Kotova ES, Alekseenko IV, Bulanenkova SS, Akopov SB, Kopantzev EP, Nikolaev LG, Chernov IP, Didych DA. Efficient Selection of Enhancers and Promoters from MIA PaCa-2 Pancreatic Cancer Cells by ChIP-lentiMPRA. Int J Mol Sci 2022; 23:ijms232315011. [PMID: 36499347 PMCID: PMC9740945 DOI: 10.3390/ijms232315011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 11/17/2022] [Accepted: 11/25/2022] [Indexed: 12/05/2022] Open
Abstract
A library of active genome regulatory elements (putative promoters and enhancers) from MIA PaCa-2 pancreatic adenocarcinoma cells was constructed using a specially designed lentiviral vector and a massive parallel reporter assay (ChIP-lentiMPRA). Chromatin immunoprecipitation of the cell genomic DNA by H3K27ac antibodies was used for primary enrichment of the library for regulatory elements. Totally, 11,264 unique genome regions, many of which are capable of enhancing the expression of the CopGFP reporter gene from the minimal CMV promoter, were identified. The regions tend to be located near promoters. Based on the proximity assay, we found an enrichment of highly expressed genes among those associated with three or more mapped distal regions (2 kb distant from the 5'-ends of genes). It was shown significant enrichment of genes related to carcinogenesis or Mia PaCa-2 cell identity genes in this group. In contrast, genes associated with 1-2 distal regions or only with proximal regions (within 2 kbp of the 5'-ends of genes) are more often related to housekeeping functions. Thus, ChIP-lentiMPRA is a useful strategy for creating libraries of regulatory elements for the study of tumor-specific gene transcription.
Collapse
Affiliation(s)
- Kirill Nikitich Kashkin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Miklukho-Maklaya, 16/10, 117997 Moscow, Russia
| | - Elena Sergeevna Kotova
- Laboratory of Human Molecular Genetics, Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, Malaya Pirogovskaya Street, 1a, 119435 Moscow, Russia
| | - Irina Vasilievna Alekseenko
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Miklukho-Maklaya, 16/10, 117997 Moscow, Russia
| | - Svetlana Sergeevna Bulanenkova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Miklukho-Maklaya, 16/10, 117997 Moscow, Russia
| | - Sergey Borisovich Akopov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Miklukho-Maklaya, 16/10, 117997 Moscow, Russia
| | - Eugene Pavlovich Kopantzev
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Miklukho-Maklaya, 16/10, 117997 Moscow, Russia
| | - Lev Grigorievich Nikolaev
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Miklukho-Maklaya, 16/10, 117997 Moscow, Russia
| | - Igor Pavlovich Chernov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Miklukho-Maklaya, 16/10, 117997 Moscow, Russia
| | - Dmitry Alexandrovich Didych
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Miklukho-Maklaya, 16/10, 117997 Moscow, Russia
- Correspondence: ; Tel.: +7-919-777-4620
| |
Collapse
|
9
|
Galouzis CC, Furlong EEM. Regulating specificity in enhancer-promoter communication. Curr Opin Cell Biol 2022; 75:102065. [PMID: 35240372 DOI: 10.1016/j.ceb.2022.01.010] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/23/2022] [Accepted: 01/25/2022] [Indexed: 12/14/2022]
Abstract
Enhancers are cis-regulatory elements that can activate transcription remotely to regulate a specific pattern of a gene's expression. Genes typically have many enhancers that are often intermingled in the loci of other genes. To regulate expression, enhancers must therefore activate their correct promoter while ignoring others that may be in closer linear proximity. In this review, we discuss mechanisms by which enhancers engage with promoters, including recent findings on the role of cohesin and the Mediator complex, and how this specificity in enhancer-promoter communication is encoded. Genetic dissection of model loci, in addition to more recent findings using genome-wide approaches, highlight the core promoter sequence, its accessibility, cofactor-promoter preference, in addition to the surrounding genomic context, as key components.
Collapse
Affiliation(s)
| | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117, Heidelberg, Germany.
| |
Collapse
|
10
|
van Breugel ME, van Leeuwen F. Epi-Decoder: Decoding the Local Proteome of a Genomic Locus by Massive Parallel Chromatin Immunoprecipitation Combined with DNA-Barcode Sequencing. Methods Mol Biol 2022; 2458:123-150. [PMID: 35103966 DOI: 10.1007/978-1-0716-2140-0_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The genome in a eukaryotic cell is packaged into chromatin and regulated by chromatin-binding and chromatin-modifying factors. Many of these factors and their complexes have been identified before, but how each genomic locus interacts with its surrounding proteins in the nucleus over time and in changing conditions remains poorly described. Measuring protein-DNA interactions at a specific locus in the genome is challenging and current techniques such as capture of a locus followed by mass spectrometry require high levels of enrichment. Epi-Decoder, a method developed in budding yeast, enables systematic decoding of the proteome of a single genomic locus of interest without the need for locus enrichment. Instead, Epi-Decoder uses massive parallel chromatin immunoprecipitation of tagged proteins combined with barcoding a genomic locus and counting of coimmunoprecipitated barcodes by DNA sequencing (TAG-ChIP-Barcode-Seq). In this scenario, DNA barcode counts serve as a quantitative readout for protein binding of each tagged protein to the barcoded locus. Epi-Decoder can be applied to determine the protein-DNA interactions at a wide range of genomic loci, such as coding genes, noncoding genes, and intergenic regions. Furthermore, Epi-Decoder provides the option to study protein-DNA interactions upon changing cellular and/or genetic conditions. In this protocol, we describe in detail how to construct Epi-Decoder libraries and how to perform an Epi-Decoder analysis.
Collapse
Affiliation(s)
| | - Fred van Leeuwen
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, The Netherlands.
- Department of Medical Biology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.
| |
Collapse
|
11
|
The non-coding genome in genetic brain disorders: new targets for therapy? Essays Biochem 2021; 65:671-683. [PMID: 34414418 PMCID: PMC8564736 DOI: 10.1042/ebc20200121] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 07/12/2021] [Accepted: 07/26/2021] [Indexed: 11/30/2022]
Abstract
The non-coding genome, consisting of more than 98% of all genetic information in humans and once judged as ‘Junk DNA’, is increasingly moving into the spotlight in the field of human genetics. Non-coding regulatory elements (NCREs) are crucial to ensure correct spatio-temporal gene expression. Technological advancements have allowed to identify NCREs on a large scale, and mechanistic studies have helped to understand the biological mechanisms underlying their function. It is increasingly becoming clear that genetic alterations of NCREs can cause genetic disorders, including brain diseases. In this review, we concisely discuss mechanisms of gene regulation and how to investigate them, and give examples of non-coding alterations of NCREs that give rise to human brain disorders. The cross-talk between basic and clinical studies enhances the understanding of normal and pathological function of NCREs, allowing better interpretation of already existing and novel data. Improved functional annotation of NCREs will not only benefit diagnostics for patients, but might also lead to novel areas of investigations for targeted therapies, applicable to a wide panel of genetic disorders. The intrinsic complexity and precision of the gene regulation process can be turned to the advantage of highly specific treatments. We further discuss this exciting new field of ‘enhancer therapy’ based on recent examples.
Collapse
|
12
|
Findlay GM. Linking genome variants to disease: scalable approaches to test the functional impact of human mutations. Hum Mol Genet 2021; 30:R187-R197. [PMID: 34338757 PMCID: PMC8490018 DOI: 10.1093/hmg/ddab219] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 07/19/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
The application of genomics to medicine has accelerated the discovery of mutations underlying disease and has enhanced our knowledge of the molecular underpinnings of diverse pathologies. As the amount of human genetic material queried via sequencing has grown exponentially in recent years, so too has the number of rare variants observed. Despite progress, our ability to distinguish which rare variants have clinical significance remains limited. Over the last decade, however, powerful experimental approaches have emerged to characterize variant effects orders of magnitude faster than before. Fueled by improved DNA synthesis and sequencing and, more recently, by CRISPR/Cas9 genome editing, multiplex functional assays provide a means of generating variant effect data in wide-ranging experimental systems. Here, I review recent applications of multiplex assays that link human variants to disease phenotypes and I describe emerging strategies that will enhance their clinical utility in coming years.
Collapse
Affiliation(s)
- Gregory M Findlay
- The Francis Crick Institute, The Genome Function Laboratory, London NW1 1AT, UK
| |
Collapse
|
13
|
Letiagina AE, Omelina ES, Ivankin AV, Pindyurin AV. MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. Front Genet 2021; 12:618189. [PMID: 34046055 PMCID: PMC8148044 DOI: 10.3389/fgene.2021.618189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/25/2021] [Indexed: 11/13/2022] Open
Abstract
Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC-ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC-ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional "mapping" samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.
Collapse
Affiliation(s)
- Anna E Letiagina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Faculty of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Evgeniya S Omelina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anton V Ivankin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Alexey V Pindyurin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
14
|
Rao S, Yao Y, Bauer DE. Editing GWAS: experimental approaches to dissect and exploit disease-associated genetic variation. Genome Med 2021; 13:41. [PMID: 33691767 PMCID: PMC7948363 DOI: 10.1186/s13073-021-00857-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 02/12/2021] [Indexed: 12/17/2022] Open
Abstract
Genome-wide association studies (GWAS) have uncovered thousands of genetic variants that influence risk for human diseases and traits. Yet understanding the mechanisms by which these genetic variants, mainly noncoding, have an impact on associated diseases and traits remains a significant hurdle. In this review, we discuss emerging experimental approaches that are being applied for functional studies of causal variants and translational advances from GWAS findings to disease prevention and treatment. We highlight the use of genome editing technologies in GWAS functional studies to modify genomic sequences, with proof-of-principle examples. We discuss the challenges in interrogating causal variants, points for consideration in experimental design and interpretation of GWAS locus mechanisms, and the potential for novel therapeutic opportunities. With the accumulation of knowledge of functional genetics, therapeutic genome editing based on GWAS discoveries will become increasingly feasible.
Collapse
Affiliation(s)
- Shuquan Rao
- Division of Hematology/Oncology, Boston Children's Hospital; Department of Pediatric Oncology, Dana-Farber Cancer Institute; Harvard Stem Cell Institute; Broad Institute; Department of Pediatrics, Harvard Medical School, Boston, MA, USA.
| | - Yao Yao
- Division of Hematology/Oncology, Boston Children's Hospital; Department of Pediatric Oncology, Dana-Farber Cancer Institute; Harvard Stem Cell Institute; Broad Institute; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- School of Basic Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Daniel E Bauer
- Division of Hematology/Oncology, Boston Children's Hospital; Department of Pediatric Oncology, Dana-Farber Cancer Institute; Harvard Stem Cell Institute; Broad Institute; Department of Pediatrics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|