1
|
Liu L, Zhao Y, Hassett R, Toneyan S, Koo P, Siepel A. Probabilistic and machine-learning methods for predicting local rates of transcription elongation from nascent RNA sequencing data. Nucleic Acids Res 2025; 53:gkaf092. [PMID: 39964478 PMCID: PMC11833694 DOI: 10.1093/nar/gkaf092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 12/12/2024] [Accepted: 02/10/2025] [Indexed: 02/21/2025] Open
Abstract
Rates of transcription elongation vary within and across eukaryotic gene bodies. Here, we introduce new methods for predicting elongation rates from nascent RNA sequencing data. First, we devise a probabilistic model that predicts nucleotide-specific elongation rates as a generalized linear function of nearby genomic and epigenomic features. We validate this model with simulations and apply it to public PRO-seq (Precision Run-On Sequencing) and epigenomic data for four cell types, finding that reductions in local elongation rate are associated with cytosine nucleotides, DNA methylation, splice sites, RNA stem-loops, CTCF (CCCTC-binding factor) binding sites, and several histone marks, including H3K36me3 and H4K20me1. By contrast, increases in local elongation rate are associated with thymines, A+T-rich and low-complexity sequences, and H3K79me2 marks. We then introduce a convolutional neural network that improves our local rate predictions. Our analysis is the first to permit genome-wide predictions of relative nucleotide-specific elongation rates.
Collapse
Affiliation(s)
- Lingjie Liu
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, United States
- Graduate Program in Genetics, Stony Brook University, Stony Brook, NY 11794, United States
| | - Yixin Zhao
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, United States
| | - Rebecca Hassett
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, United States
| | - Shushan Toneyan
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, United States
| | - Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, United States
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, United States
- Graduate Program in Genetics, Stony Brook University, Stony Brook, NY 11794, United States
| |
Collapse
|
2
|
Liu L, Zhao Y, Siepel A. DNA-sequence and epigenomic determinants of local rates of transcription elongation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.21.572932. [PMID: 38187771 PMCID: PMC10769381 DOI: 10.1101/2023.12.21.572932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Across all branches of life, transcription elongation is a crucial, regulated phase in gene expression. Many recent studies in eukaryotes have focused on the regulation of promoter-proximal pausing of RNA Polymerase II (Pol II), but rates of productive elongation also vary substantially throughout the gene body, both within and across genes. Here, we introduce a probabilistic model for systematically evaluating potential determinants of the local elongation rate based on nascent RNA sequencing (NRS) data. Our model is derived from a unified model for both the kinetics of Pol II movement along the DNA template and the generation of NRS read counts at steady state. It allows for a continuously variable elongation rate along the gene body, with the rate at each nucleotide defined by a generalized linear relationship with nearby genomic and epigenomic features. High-dimensional feature vectors are accommodated through a sparse-regression extension. We show with simulations that the model allows accurate detection of associated features and accurate prediction of local elongation rates. In an analysis of public PRO-seq and epigenomic data, we identify several features that are strongly associated with reductions in the local elongation rate, including DNA methylation, splice sites, RNA stem-loops, CTCF binding sites, and several histone marks, including H3K36me3 and H4K20me1. By contrast, low-complexity sequences and H3K79me2 marks are associated with increases in elongation rate. In an analysis of DNA k -mers, we find that cytosine nucleotides are strongly associated with reductions in local elongation rate, particularly when preceded by guanines and followed by adenines or thymines. Increases in elongation rate are associated with thymines and A+T-rich k -mers. These associations are generally shared across cell types, and by considering them our model is effective at predicting features of held-out PRO-seq data. Overall, our analysis is the first to permit genome-wide predictions of relative nucleotide-specific elongation rates based on complex sets of genomic and epigenomic covariates. We have made predictions available for the K562, CD14+, MCF-7, and HeLa-S3 cell types in a UCSC Genome Browser track.
Collapse
Affiliation(s)
- Lingjie Liu
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
- Graduate Program in Genetics, Stony Brook University, Stony Brook, NY
| | - Yixin Zhao
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
- Graduate Program in Genetics, Stony Brook University, Stony Brook, NY
| |
Collapse
|
3
|
Zhao Y, Liu L, Hassett R, Siepel A. Model-based characterization of the equilibrium dynamics of transcription initiation and promoter-proximal pausing in human cells. Nucleic Acids Res 2023; 51:e106. [PMID: 37889042 PMCID: PMC10681744 DOI: 10.1093/nar/gkad843] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 09/13/2023] [Accepted: 09/21/2023] [Indexed: 10/28/2023] Open
Abstract
In metazoans, both transcription initiation and the escape of RNA polymerase (RNAP) from promoter-proximal pausing are key rate-limiting steps in gene expression. These processes play out at physically proximal sites on the DNA template and appear to influence one another through steric interactions. Here, we examine the dynamics of these processes using a combination of statistical modeling, simulation, and analysis of real nascent RNA sequencing data. We develop a simple probabilistic model that jointly describes the kinetics of transcription initiation, pause-escape, and elongation, and the generation of nascent RNA sequencing read counts under steady-state conditions. We then extend this initial model to allow for variability across cells in promoter-proximal pause site locations and steric hindrance of transcription initiation from paused RNAPs. In an extensive series of simulations, we show that this model enables accurate estimation of initiation and pause-escape rates. Furthermore, we show by simulation and analysis of real data that pause-escape is often strongly rate-limiting and that steric hindrance can dramatically reduce initiation rates. Our modeling framework is applicable to a variety of inference problems, and our software for estimation and simulation is freely available.
Collapse
Affiliation(s)
- Yixin Zhao
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Lingjie Liu
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
- Graduate Program in Genetics, Stony Brook University, Stony Brook, NY, USA
| | - Rebecca Hassett
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
- Graduate Program in Genetics, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
4
|
Barshad G, Lewis JJ, Chivu AG, Abuhashem A, Krietenstein N, Rice EJ, Ma Y, Wang Z, Rando OJ, Hadjantonakis AK, Danko CG. RNA polymerase II dynamics shape enhancer-promoter interactions. Nat Genet 2023; 55:1370-1380. [PMID: 37430091 PMCID: PMC10714922 DOI: 10.1038/s41588-023-01442-7] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 06/09/2023] [Indexed: 07/12/2023]
Abstract
How enhancers control target gene expression over long genomic distances remains an important unsolved problem. Here we investigated enhancer-promoter communication by integrating data from nucleosome-resolution genomic contact maps, nascent transcription and perturbations affecting either RNA polymerase II (Pol II) dynamics or the activity of thousands of candidate enhancers. Integration of new Micro-C experiments with published CRISPRi data demonstrated that enhancers spend more time in close proximity to their target promoters in functional enhancer-promoter pairs compared to nonfunctional pairs, which can be attributed in part to factors unrelated to genomic position. Manipulation of the transcription cycle demonstrated a key role for Pol II in enhancer-promoter interactions. Notably, promoter-proximal paused Pol II itself partially stabilized interactions. We propose an updated model in which elements of transcriptional dynamics shape the duration or frequency of interactions to facilitate enhancer-promoter communication.
Collapse
Affiliation(s)
- Gilad Barshad
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - James J Lewis
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| | - Alexandra G Chivu
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Abderhman Abuhashem
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York City, NY, USA
- Weill Cornell/Rockefeller/Sloan Kettering Tri-Institutional MD-PhD Program, New York City, NY, USA
- Biochemistry Cell and Molecular Biology Program, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York City, NY, USA
| | - Nils Krietenstein
- The Novo Nordisk Center for Protein Research (CPR), Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Edward J Rice
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Yitian Ma
- School of Life and Pharmaceutical Sciences, Dalian University of Technology, Dalian, China
| | - Zhong Wang
- School of Life and Pharmaceutical Sciences, Dalian University of Technology, Dalian, China
| | - Oliver J Rando
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Anna-Katerina Hadjantonakis
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York City, NY, USA
- Biochemistry Cell and Molecular Biology Program, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York City, NY, USA
| | - Charles G Danko
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA.
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA.
| |
Collapse
|