1
|
Vock IW, Mabin JW, Machyna M, Zhang A, Hogg JR, Simon MD. Expanding and improving analyses of nucleotide recoding RNA-seq experiments with the EZbakR suite. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.14.617411. [PMID: 39463977 PMCID: PMC11507695 DOI: 10.1101/2024.10.14.617411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Nucleotide recoding RNA sequencing methods (NR-seq; TimeLapse-seq, SLAM-seq, TUC-seq, etc.) are powerful approaches for assaying transcript population dynamics. In addition, these methods have been extended to probe a host of regulated steps in the RNA life cycle. Current bioinformatic tools significantly constrain analyses of NR-seq data. To address this limitation, we developed EZbakR, an R package to facilitate a more comprehensive set of NR-seq analyses, and fastq2EZbakR, a Snakemake pipeline for flexible preprocessing of NR-seq datasets, collectively referred to as the EZbakR suite. Together, these tools generalize many aspects of the NR-seq analysis workflow. The fastq2EZbakR pipeline can assign reads to a diverse set of genomic features (e.g., genes, exons, splice junctions, etc.), and EZbakR can perform analyses on any combination of these features. EZbakR extends standard NR-seq mutational modeling to support multi-label analyses (e.g., s4U and s6G dual labeling), and implements an improved hierarchical model to better account for transcript-to-transcript variance in metabolic label incorporation. EZbakR also generalizes dynamical systems modeling of NR-seq data to support analyses of premature mRNA processing and flow between subcellular compartments. Finally, EZbakR implements flexible and well-powered comparative analyses of all estimated parameters via design matrix-specified generalized linear modeling. The EZbakR suite will thus allow researchers to make full, effective use of NR-seq data.
Collapse
Affiliation(s)
- Isaac W. Vock
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA
| | - Justin W. Mabin
- Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Martin Machyna
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA
- Present address: Paul-Ehrlich-Institut, Host-Pathogen-Interactions, 63225 Langen, Germany
| | - Alexandra Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA
| | - J. Robert Hogg
- Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Matthew D. Simon
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA
| |
Collapse
|
2
|
Vock IW, Simon MD. bakR: uncovering differential RNA synthesis and degradation kinetics transcriptome-wide with Bayesian hierarchical modeling. RNA (NEW YORK, N.Y.) 2023; 29:958-976. [PMID: 37028916 PMCID: PMC10275263 DOI: 10.1261/rna.079451.122] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 03/14/2023] [Indexed: 06/18/2023]
Abstract
Differential expression analysis of RNA sequencing (RNA-seq) data can identify changes in cellular RNA levels, but provides limited information about the kinetic mechanisms underlying such changes. Nucleotide recoding RNA-seq methods (NR-seq; e.g., TimeLapse-seq, SLAM-seq, etc.) address this shortcoming and are widely used approaches to identify changes in RNA synthesis and degradation kinetics. While advanced statistical models implemented in user-friendly software (e.g., DESeq2) have ensured the statistical rigor of differential expression analyses, no such tools that facilitate differential kinetic analysis with NR-seq exist. Here, we report the development of Bayesian analysis of the kinetics of RNA (bakR; https:// github.com/simonlabcode/bakR), an R package to address this need. bakR relies on Bayesian hierarchical modeling of NR-seq data to increase statistical power by sharing information across transcripts. Analyses of simulated data confirmed that bakR implementations of the hierarchical model outperform attempts to analyze differential kinetics with existing models. bakR also uncovers biological signals in real NR-seq data sets and provides improved analyses of existing data sets. This work establishes bakR as an important tool for identifying differential RNA synthesis and degradation kinetics.
Collapse
Affiliation(s)
- Isaac W Vock
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06536, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06477, USA
| | - Matthew D Simon
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06536, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06477, USA
| |
Collapse
|
3
|
Rummel T, Sakellaridi L, Erhard F. grandR: a comprehensive package for nucleotide conversion RNA-seq data analysis. Nat Commun 2023; 14:3559. [PMID: 37321987 PMCID: PMC10272207 DOI: 10.1038/s41467-023-39163-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 06/01/2023] [Indexed: 06/17/2023] Open
Abstract
Metabolic labeling of RNA is a powerful technique for studying the temporal dynamics of gene expression. Nucleotide conversion approaches greatly facilitate the generation of data but introduce challenges for their analysis. Here we present grandR, a comprehensive package for quality control, differential gene expression analysis, kinetic modeling, and visualization of such data. We compare several existing methods for inference of RNA synthesis rates and half-lives using progressive labeling time courses. We demonstrate the need for recalibration of effective labeling times and introduce a Bayesian approach to study the temporal dynamics of RNA using snapshot experiments.
Collapse
Affiliation(s)
- Teresa Rummel
- Institute for Virology and Immunobiology, University of Würzburg, Versbacher Str. 7, 97078, Würzburg, Germany
| | - Lygeri Sakellaridi
- Institute for Virology and Immunobiology, University of Würzburg, Versbacher Str. 7, 97078, Würzburg, Germany
| | - Florian Erhard
- Institute for Virology and Immunobiology, University of Würzburg, Versbacher Str. 7, 97078, Würzburg, Germany.
- Faculty for Informatics and Data Science, University of Regensburg, Bajuwarenstr. 4, 93053, Regensburg, Germany.
| |
Collapse
|
4
|
Hersch M, Biasini A, Marques AC, Bergmann S. Estimating RNA dynamics using one time point for one sample in a single-pulse metabolic labeling experiment. BMC Bioinformatics 2022; 23:147. [PMID: 35459101 PMCID: PMC9034570 DOI: 10.1186/s12859-022-04672-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 04/04/2022] [Indexed: 11/05/2022] Open
Abstract
Background Over the past decade, experimental procedures such as metabolic labeling for determining RNA turnover rates at the transcriptome-wide scale have been widely adopted and are now turning to single cell measurements. Several computational methods to estimate RNA synthesis, processing and degradation rates from such experiments have been suggested, but they all require several RNA sequencing samples. Here we present a method that can estimate those three rates from a single sample. Methods Our method relies on the analytical solution to the Zeisel model of RNA dynamics. It was validated on metabolic labeling experiments performed on mouse embryonic stem cells. Resulting degradation rates were compared both to previously published rates on the same system and to a state-of-the-art method applied to the same data. Results Our method is computationally efficient and outputs rates that correlate well with previously published data sets. Using it on a single sample, we were able to reproduce the observation that dynamic biological processes tend to involve genes with higher metabolic rates, while stable processes involve genes with lower rates. This supports the hypothesis that cells control not only the mRNA steady-state abundance, but also its responsiveness, i.e., how fast steady state is reached. Moreover, degradation rates obtained with our method compare favourably with the other tested method. Conclusions In addition to saving experimental work and computational time, estimating rates for a single sample has several advantages. It does not require an error-prone normalization across samples and enables the use of replicates to estimate uncertainty and assess sample quality. Finally the method and theoretical results described here are general enough to be useful in other contexts such as nucleotide conversion methods and single cell metabolic labeling experiments. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04672-4.
Collapse
Affiliation(s)
- Micha Hersch
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland. .,Swiss Institute of Bioinformatics, 1015, Lausanne, CH, Switzerland.
| | - Adriano Biasini
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,RNA Therapeutics Institute, University of Massachusetts Medical School, Worcester, MA, USA
| | - Ana C Marques
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Sven Bergmann
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, 1015, Lausanne, CH, Switzerland
| |
Collapse
|
5
|
Furlan M, de Pretis S, Pelizzola M. Dynamics of transcriptional and post-transcriptional regulation. Brief Bioinform 2021; 22:bbaa389. [PMID: 33348360 PMCID: PMC8294512 DOI: 10.1093/bib/bbaa389] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 11/12/2020] [Accepted: 11/27/2020] [Indexed: 02/07/2023] Open
Abstract
Despite gene expression programs being notoriously complex, RNA abundance is usually assumed as a proxy for transcriptional activity. Recently developed approaches, able to disentangle transcriptional and post-transcriptional regulatory processes, have revealed a more complex scenario. It is now possible to work out how synthesis, processing and degradation kinetic rates collectively determine the abundance of each gene's RNA. It has become clear that the same transcriptional output can correspond to different combinations of the kinetic rates. This underscores the fact that markedly different modes of gene expression regulation exist, each with profound effects on a gene's ability to modulate its own expression. This review describes the development of the experimental and computational approaches, including RNA metabolic labeling and mathematical modeling, that have been disclosing the mechanisms underlying complex transcriptional programs. Current limitations and future perspectives in the field are also discussed.
Collapse
Affiliation(s)
- Mattia Furlan
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), 20139 Milan, Italy
| | - Stefano de Pretis
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), 20139 Milan, Italy
| | - Mattia Pelizzola
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), 20139 Milan, Italy
| |
Collapse
|
6
|
Boileau E, Altmüller J, Naarmann-de Vries IS, Dieterich C. A comparison of metabolic labeling and statistical methods to infer genome-wide dynamics of RNA turnover. Brief Bioinform 2021; 22:6315814. [PMID: 34228787 PMCID: PMC8574959 DOI: 10.1093/bib/bbab219] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 05/18/2021] [Accepted: 05/21/2021] [Indexed: 12/27/2022] Open
Abstract
Metabolic labeling of newly transcribed RNAs coupled with RNA-seq is being increasingly used for genome-wide analysis of RNA dynamics. Methods including standard biochemical enrichment and recent nucleotide conversion protocols each require special experimental and computational treatment. Despite their immediate relevance, these technologies have not yet been assessed and benchmarked, and no data are currently available to advance reproducible research and the development of better inference tools. Here, we present a systematic evaluation and comparison of four RNA labeling protocols: 4sU-tagging biochemical enrichment, including spike-in RNA controls, SLAM-seq, TimeLapse-seq and TUC-seq. All protocols are evaluated based on practical considerations, conversion efficiency and wet lab requirements to handle hazardous substances. We also compute decay rate estimates and confidence intervals for each protocol using two alternative statistical frameworks, pulseR and GRAND-SLAM, for over 11 600 human genes and evaluate the underlying computational workflows for their robustness and ease of use. Overall, we demonstrate a high inter-method reliability across eight use case scenarios. Our results and data will facilitate reproducible research and serve as a resource contributing to a fuller understanding of RNA biology.
Collapse
Affiliation(s)
- Etienne Boileau
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Im Neuenheimer Feld 669, 69120, Heidelberg, Germany.,Department of Internal Medicine III (Cardiology, Angiology, and Pneumology), University Hospital Heidelberg, Im Neuenheimer Feld 669, 69120, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research) Partner Site Heidelberg/Mannheim
| | - Janine Altmüller
- Cologne Center for Genomics (CCG), University of Cologne, Weyertal 115b, 50931, Kön, Germany.,Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Genomics, Charitéplatz 1, 10117 Berlin, Germany.,Max Delbrük Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Isabel S Naarmann-de Vries
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Im Neuenheimer Feld 669, 69120, Heidelberg, Germany.,Department of Internal Medicine III (Cardiology, Angiology, and Pneumology), University Hospital Heidelberg, Im Neuenheimer Feld 669, 69120, Heidelberg, Germany.,Department of Intensive Care Medicine, University Hospital Aachen, Pauwelsstrasse 30, 52074, Aachen, Germany
| | - Christoph Dieterich
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Im Neuenheimer Feld 669, 69120, Heidelberg, Germany.,Department of Internal Medicine III (Cardiology, Angiology, and Pneumology), University Hospital Heidelberg, Im Neuenheimer Feld 669, 69120, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research) Partner Site Heidelberg/Mannheim
| |
Collapse
|
7
|
Furlan M, Galeota E, Gaudio ND, Dassi E, Caselle M, de Pretis S, Pelizzola M. Genome-wide dynamics of RNA synthesis, processing, and degradation without RNA metabolic labeling. Genome Res 2020; 30:1492-1507. [PMID: 32978246 PMCID: PMC7605262 DOI: 10.1101/gr.260984.120] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 08/21/2020] [Indexed: 12/13/2022]
Abstract
The quantification of the kinetic rates of RNA synthesis, processing, and degradation are largely based on the integrative analysis of total and nascent transcription, the latter being quantified through RNA metabolic labeling. We developed INSPEcT−, a computational method based on the mathematical modeling of premature and mature RNA expression that is able to quantify kinetic rates from steady-state or time course total RNA-seq data without requiring any information on nascent transcripts. Our approach outperforms available solutions, closely recapitulates the kinetic rates obtained through RNA metabolic labeling, improves the ability to detect changes in transcript half-lives, reduces the cost and complexity of the experiments, and can be adopted to study experimental conditions in which nascent transcription cannot be readily profiled. Finally, we applied INSPEcT− to the characterization of post-transcriptional regulation landscapes in dozens of physiological and disease conditions. This approach was included in the INSPEcT Bioconductor package, which can now unveil RNA dynamics from steady-state or time course data, with or without the profiling of nascent RNA.
Collapse
Affiliation(s)
- Mattia Furlan
- Center for Genomic Science, Fondazione Istituto Italiano di Tecnologia, 20139 Milan, Italy.,Physics Department and INFN, University of Turin, 10125 Turin, Italy
| | - Eugenia Galeota
- Center for Genomic Science, Fondazione Istituto Italiano di Tecnologia, 20139 Milan, Italy
| | - Nunzio Del Gaudio
- Center for Genomic Science, Fondazione Istituto Italiano di Tecnologia, 20139 Milan, Italy
| | - Erik Dassi
- Centre for Integrative Biology, University of Trento, 38123 Trento, Italy
| | - Michele Caselle
- Physics Department and INFN, University of Turin, 10125 Turin, Italy
| | - Stefano de Pretis
- Center for Genomic Science, Fondazione Istituto Italiano di Tecnologia, 20139 Milan, Italy
| | - Mattia Pelizzola
- Center for Genomic Science, Fondazione Istituto Italiano di Tecnologia, 20139 Milan, Italy
| |
Collapse
|
8
|
de Pretis S, Furlan M, Pelizzola M. INSPEcT-GUI Reveals the Impact of the Kinetic Rates of RNA Synthesis, Processing, and Degradation, on Premature and Mature RNA Species. Front Genet 2020; 11:759. [PMID: 32765590 PMCID: PMC7379887 DOI: 10.3389/fgene.2020.00759] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 06/26/2020] [Indexed: 12/23/2022] Open
Abstract
The abundance of RNA species and their response to perturbations are set by the kinetics rates of RNA synthesis, processing, and degradation. However, the visualization, interpretation, and manipulation of these data require familiarity with mathematical modeling and command line tools. INSPEcT-GUI is an R-Shiny interface that allows researchers without specific training to effortlessly explore how the fine kinetic regulation of the RNA life cycle can shape gene expression programs. In particular, it allows to: (i) interactively visualize gene-level RNA dynamics; (ii) refine the model fit of experimental data; (iii) test alternative regulatory models; (iv) explore, independently from the availability of data, how the combined action of the RNA kinetic rates impacts on premature and mature RNA. INSPEcT-GUI is freely available within the R/Bioconductor package INSPEcT at http://bioconductor.org/packages/INSPEcT/. An HTML vignette including documentation on the tool startup and usage, executable examples, and a video demonstration, are available at: http://bioconductor.org/packages/release/bioc/vignettes/INSPEcT/inst/doc/INSPEcT_GUI.html.
Collapse
Affiliation(s)
- Stefano de Pretis
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, Milan, Italy
| | - Mattia Furlan
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, Milan, Italy
| | - Mattia Pelizzola
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, Milan, Italy
| |
Collapse
|
9
|
Wang X, He S, Li J, Wang J, Wang C, Wang M, He D, Lv X, Zhong Q, Wang H, Wang Z. pulseTD: RNA life cycle dynamics analysis based on pulse model of 4sU-seq time course sequencing data. PeerJ 2020; 8:e9371. [PMID: 32714656 PMCID: PMC7353919 DOI: 10.7717/peerj.9371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 05/27/2020] [Indexed: 11/20/2022] Open
Abstract
The life cycle of intracellular RNA mainly involves transcriptional production, splicing maturation and degradation processes. Their dynamic changes are termed as RNA life cycle dynamics (RLCD). It is still challenging for the accurate and robust identification of RLCD under unknow the functional form of RLCD. By using the pulse model, we developed an R package named pulseTD to identify RLCD by integrating 4sU-seq and RNA-seq data, and it provides flexible functions to capture continuous changes in RCLD rates. More importantly, it also can predict the trend of RNA transcription and expression changes in future time points. The pulseTD shows better accuracy and robustness than some other methods, and it is available on the GitHub repository (https://github.com/bioWzz/pulseTD_0.2.0).
Collapse
Affiliation(s)
- Xin Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Heilongjiang, China
| | - Siyu He
- College of Bioinformatics Science and Technology, Harbin Medical University, Heilongjiang, China
| | - Jian Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Heilongjiang, China
| | - Jun Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Heilongjiang, China
| | - Chengyi Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Heilongjiang, China
| | - Mingwei Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Heilongjiang, China
| | - Danni He
- College of Bioinformatics Science and Technology, Harbin Medical University, Heilongjiang, China
| | - Xingfeng Lv
- College of Computer Science and Technology, Heilongjiang University, Harbin, China
| | | | - Hongjiu Wang
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, China
- School of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
- College of Science, Heilongjiang University of Science and Technology, Harbin, China
| | - Zhenzhen Wang
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, China
- School of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
| |
Collapse
|
10
|
Abstract
Summary Global quantification of total RNA is used to investigate steady state levels of gene expression. However, being able to differentiate pre-existing RNA (that has been synthesized prior to a defined point in time) and newly transcribed RNA can provide invaluable information e.g. to estimate RNA half-lives or identify fast and complex regulatory processes. Recently, new techniques based on metabolic labeling and RNA-seq have emerged that allow to quantify new and old RNA: Nucleoside analogs are incorporated into newly transcribed RNA and are made detectable as point mutations in mapped reads. However, relatively infrequent incorporation events and significant sequencing error rates make the differentiation between old and new RNA a highly challenging task. We developed a statistical approach termed GRAND-SLAM that, for the first time, allows to estimate the proportion of old and new RNA in such an experiment. Uncertainty in the estimates is quantified in a Bayesian framework. Simulation experiments show our approach to be unbiased and highly accurate. Furthermore, we analyze how uncertainty in the proportion translates into uncertainty in estimating RNA half-lives and give guidelines for planning experiments. Finally, we demonstrate that our estimates of RNA half-lives compare favorably to other experimental approaches and that biological processes affecting RNA half-lives can be investigated with greater power than offered by any other method. GRAND-SLAM is freely available for non-commercial use at http://software.erhard-lab.de; R scripts to generate all figures are available at zenodo (doi: 10.5281/zenodo.1162340).
Collapse
Affiliation(s)
- Christopher Jürges
- Institut für Virologie und Immunbiologie, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Lars Dölken
- Institut für Virologie und Immunbiologie, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Florian Erhard
- Institut für Virologie und Immunbiologie, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| |
Collapse
|
11
|
On the optimal design of metabolic RNA labeling experiments. PLoS Comput Biol 2019; 15:e1007252. [PMID: 31390362 PMCID: PMC6699717 DOI: 10.1371/journal.pcbi.1007252] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 08/19/2019] [Accepted: 07/08/2019] [Indexed: 01/16/2023] Open
Abstract
Massively parallel RNA sequencing (RNA-seq) in combination with metabolic labeling has become the de facto standard approach to study alterations in RNA transcription, processing or decay. Regardless of advances in the experimental protocols and techniques, every experimentalist needs to specify the key aspects of experimental design: For example, which protocol should be used (biochemical separation vs. nucleotide conversion) and what is the optimal labeling time? In this work, we provide approximate answers to these questions using the asymptotic theory of optimal design. Specifically, we investigate, how the variance of degradation rate estimates depends on the time and derive the optimal time for any given degradation rate. Subsequently, we show that an increase in sample numbers should be preferred over an increase in sequencing depth. Lastly, we provide some guidance on use cases when laborious biochemical separation outcompetes recent nucleotide conversion based methods (such as SLAMseq) and show, how inefficient conversion influences the precision of estimates. Code and documentation can be found at https://github.com/dieterich-lab/DesignMetabolicRNAlabeling. Massively parallel RNA sequencing (RNA-seq) in combination with metabolic labeling has become the de facto standard approach to study alterations in RNA transcription, processing or decay. In our manuscript, we address several key aspects of experimental design: 1) The optimal labeling time, 2) the number of replicate samples over sequencing depth and 3) the choice of experimental protocol. We provide approximate answers to these questions using asymptotic theory of optimal design.
Collapse
|
12
|
RNA Modification Level Estimation with pulseR. Genes (Basel) 2018; 9:genes9120619. [PMID: 30544755 PMCID: PMC6316556 DOI: 10.3390/genes9120619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 12/03/2018] [Accepted: 12/05/2018] [Indexed: 12/03/2022] Open
Abstract
RNA modifications regulate the complex life of transcripts. An experimental approach called LAIC-seq was developed to characterize modification levels on a transcriptome-wide scale. In this method, the modified and unmodified molecules are separated using antibodies specific for a given RNA modification (e.g., m6A). In essence, the procedure of biochemical separation yields three fractions: Input, eluate, and supernatent, which are subjected to RNA-seq. In this work, we present a bioinformatics workflow, which starts from RNA-seq data to infer gene-specific modification levels by a statistical model on a transcriptome-wide scale. Our workflow centers around the pulseR package, which was originally developed for the analysis of metabolic labeling experiments. We demonstrate how to analyze data without external normalization (i.e., in the absence of spike-ins), given high efficiency of separation, and how, alternatively, scaling factors can be derived from unmodified spike-ins. Importantly, our workflow provides an estimate of uncertainty of modification levels in terms of confidence intervals for model parameters, such as gene expression and RNA modification levels. We also compare alternative model parametrizations, log-odds, or the proportion of the modified molecules and discuss the pros and cons of each representation. In summary, our workflow is a versatile approach to RNA modification level estimation, which is open to any read-count-based experimental approach.
Collapse
|