1
|
Nuclear export is a limiting factor in eukaryotic mRNA metabolism. PLoS Comput Biol 2024; 20:e1012059. [PMID: 38753883 DOI: 10.1371/journal.pcbi.1012059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 04/09/2024] [Indexed: 05/18/2024] Open
Abstract
The eukaryotic mRNA life cycle includes transcription, nuclear mRNA export and degradation. To quantify all these processes simultaneously, we perform thiol-linked alkylation after metabolic labeling of RNA with 4-thiouridine (4sU), followed by sequencing of RNA (SLAM-seq) in the nuclear and cytosolic compartments of human cancer cells. We develop a model that reliably quantifies mRNA-specific synthesis, nuclear export, and nuclear and cytosolic degradation rates on a genome-wide scale. We find that nuclear degradation of polyadenylated mRNA is negligible and nuclear mRNA export is slow, while cytosolic mRNA degradation is comparatively fast. Consequently, an mRNA molecule generally spends most of its life in the nucleus. We also observe large differences in the nuclear export rates of different 3'UTR transcript isoforms. Furthermore, we identify genes whose expression is abruptly induced upon metabolic labeling. These transcripts are exported substantially faster than average mRNAs, suggesting the existence of alternative export pathways. Our results highlight nuclear mRNA export as a limiting factor in mRNA metabolism and gene regulation.
Collapse
|
2
|
Species-aware DNA language models capture regulatory elements and their evolution. Genome Biol 2024; 25:83. [PMID: 38566111 PMCID: PMC10985990 DOI: 10.1186/s13059-024-03221-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 03/20/2024] [Indexed: 04/04/2024] Open
Abstract
BACKGROUND The rise of large-scale multi-species genome sequencing projects promises to shed new light on how genomes encode gene regulatory instructions. To this end, new algorithms are needed that can leverage conservation to capture regulatory elements while accounting for their evolution. RESULTS Here, we introduce species-aware DNA language models, which we trained on more than 800 species spanning over 500 million years of evolution. Investigating their ability to predict masked nucleotides from context, we show that DNA language models distinguish transcription factor and RNA-binding protein motifs from background non-coding sequence. Owing to their flexibility, DNA language models capture conserved regulatory elements over much further evolutionary distances than sequence alignment would allow. Remarkably, DNA language models reconstruct motif instances bound in vivo better than unbound ones and account for the evolution of motif sequences and their positional constraints, showing that these models capture functional high-order sequence and evolutionary context. We further show that species-aware training yields improved sequence representations for endogenous and MPRA-based gene expression prediction, as well as motif discovery. CONCLUSIONS Collectively, these results demonstrate that species-aware DNA language models are a powerful, flexible, and scalable tool to integrate information from large compendia of highly diverged genomes.
Collapse
|
3
|
Non-Coding RNAs: Regulators of Stress, Ageing, and Developmental Decisions in Yeast? Cells 2024; 13:599. [PMID: 38607038 PMCID: PMC11012152 DOI: 10.3390/cells13070599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 03/19/2024] [Accepted: 03/27/2024] [Indexed: 04/13/2024] Open
Abstract
Cells must change their properties in order to adapt to a constantly changing environment. Most of the cellular sensing and regulatory mechanisms described so far are based on proteins that serve as sensors, signal transducers, and effectors of signalling pathways, resulting in altered cell physiology. In recent years, however, remarkable examples of the critical role of non-coding RNAs in some of these regulatory pathways have been described in various organisms. In this review, we focus on all classes of non-coding RNAs that play regulatory roles during stress response, starvation, and ageing in different yeast species as well as in structured yeast populations. Such regulation can occur, for example, by modulating the amount and functional state of tRNAs, rRNAs, or snRNAs that are directly involved in the processes of translation and splicing. In addition, long non-coding RNAs and microRNA-like molecules are bona fide regulators of the expression of their target genes. Non-coding RNAs thus represent an additional level of cellular regulation that is gradually being uncovered.
Collapse
|
4
|
FUN-PROSE: A deep learning approach to predict condition-specific gene expression in fungi. PLoS Comput Biol 2023; 19:e1011563. [PMID: 37971967 PMCID: PMC10653424 DOI: 10.1371/journal.pcbi.1011563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 09/30/2023] [Indexed: 11/19/2023] Open
Abstract
mRNA levels of all genes in a genome is a critical piece of information defining the overall state of the cell in a given environmental condition. Being able to reconstruct such condition-specific expression in fungal genomes is particularly important to metabolically engineer these organisms to produce desired chemicals in industrially scalable conditions. Most previous deep learning approaches focused on predicting the average expression levels of a gene based on its promoter sequence, ignoring its variation across different conditions. Here we present FUN-PROSE-a deep learning model trained to predict differential expression of individual genes across various conditions using their promoter sequences and expression levels of all transcription factors. We train and test our model on three fungal species and get the correlation between predicted and observed condition-specific gene expression as high as 0.85. We then interpret our model to extract promoter sequence motifs responsible for variable expression of individual genes. We also carried out input feature importance analysis to connect individual transcription factors to their gene targets. A sizeable fraction of both sequence motifs and TF-gene interactions learned by our model agree with previously known biological information, while the rest corresponds to either novel biological facts or indirect correlations.
Collapse
|
5
|
Polysome propensity and tunable thresholds in coding sequence length enable differential mRNA stability. SCIENCE ADVANCES 2023; 9:eadh9545. [PMID: 37756413 PMCID: PMC10530222 DOI: 10.1126/sciadv.adh9545] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 08/25/2023] [Indexed: 09/29/2023]
Abstract
The half-life of mRNAs, as well as their translation, increases in proportion to the optimal codons, indicating a tight coupling of codon-dependent differential translation and degradation. Little is known about the regulation of this coupling. We found that the mRNA stability gain in yeast depends on the mRNA coding sequence length. Below a critical length, codon optimality fails to affect the stability of mRNAs although they can be efficiently translated into short peptides and proteins. Above this threshold length, codon optimality-dependent differential mRNA stability emerges in a switch-like fashion, which coincides with a similar increase in the polysome propensity of the mRNAs. This threshold length can be tuned by the untranslated regions (UTR). Some of these UTRs can destabilize mRNAs without reducing translation, which plays a role in controlling the amplitude of the oscillatory expression of cell cycle genes. Our findings help understand the translation of short peptides from noncoding RNAs and the translation by localized monosomes in neurons.
Collapse
|
6
|
The genetic and biochemical determinants of mRNA degradation rates in mammals. Genome Biol 2022; 23:245. [PMID: 36419176 PMCID: PMC9684954 DOI: 10.1186/s13059-022-02811-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 11/02/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Degradation rate is a fundamental aspect of mRNA metabolism, and the factors governing it remain poorly characterized. Understanding the genetic and biochemical determinants of mRNA half-life would enable more precise identification of variants that perturb gene expression through post-transcriptional gene regulatory mechanisms. RESULTS We establish a compendium of 39 human and 27 mouse transcriptome-wide mRNA decay rate datasets. A meta-analysis of these data identified a prevalence of technical noise and measurement bias, induced partially by the underlying experimental strategy. Correcting for these biases allowed us to derive more precise, consensus measurements of half-life which exhibit enhanced consistency between species. We trained substantially improved statistical models based upon genetic and biochemical features to better predict half-life and characterize the factors molding it. Our state-of-the-art model, Saluki, is a hybrid convolutional and recurrent deep neural network which relies only upon an mRNA sequence annotated with coding frame and splice sites to predict half-life (r=0.77). The key novel principle learned by Saluki is that the spatial positioning of splice sites, codons, and RNA-binding motifs within an mRNA is strongly associated with mRNA half-life. Saluki predicts the impact of RNA sequences and genetic mutations therein on mRNA stability, in agreement with functional measurements derived from massively parallel reporter assays. CONCLUSIONS Our work produces a more robust ground truth for transcriptome-wide mRNA half-lives in mammalian cells. Using these revised measurements, we trained Saluki, a model that is over 50% more accurate in predicting half-life from sequence than existing models. Saluki succinctly captures many of the known determinants of mRNA half-life and can be rapidly deployed to predict the functional consequences of arbitrary mutations in the transcriptome.
Collapse
|
7
|
Controlling gene expression with deep generative design of regulatory DNA. Nat Commun 2022; 13:5099. [PMID: 36042233 PMCID: PMC9427793 DOI: 10.1038/s41467-022-32818-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 08/18/2022] [Indexed: 11/25/2022] Open
Abstract
Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Using mutagenesis typically requires screening sizable random DNA libraries, which limits the designs to span merely a short section of the promoter and restricts their control of gene expression. Here, we prototype a deep learning strategy based on generative adversarial networks (GAN) by learning directly from genomic and transcriptomic data. Our ExpressionGAN can traverse the entire regulatory sequence-expression landscape in a gene-specific manner, generating regulatory DNA with prespecified target mRNA levels spanning the whole gene regulatory structure including coding and adjacent non-coding regions. Despite high sequence divergence from natural DNA, in vivo measurements show that 57% of the highly-expressed synthetic sequences surpass the expression levels of highly-expressed natural controls. This demonstrates the applicability and relevance of deep generative design to expand our knowledge and control of gene expression regulation in any desired organism, condition or tissue. Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Here the authors present EspressionGAN, a generative adversarial network that uses genomic and transcriptomic data to generate regulatory sequences.
Collapse
|
8
|
iCodon customizes gene expression based on the codon composition. Sci Rep 2022; 12:12126. [PMID: 35840631 PMCID: PMC9287306 DOI: 10.1038/s41598-022-15526-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 06/24/2022] [Indexed: 11/09/2022] Open
Abstract
Messenger RNA (mRNA) stability substantially impacts steady-state gene expression levels in a cell. mRNA stability is strongly affected by codon composition in a translation-dependent manner across species, through a mechanism termed codon optimality. We have developed iCodon (www.iCodon.org), an algorithm for customizing mRNA expression through the introduction of synonymous codon substitutions into the coding sequence. iCodon is optimized for four vertebrate transcriptomes: mouse, human, frog, and fish. Users can predict the mRNA stability of any coding sequence based on its codon composition and subsequently generate more stable (optimized) or unstable (deoptimized) variants encoding for the same protein. Further, we show that codon optimality predictions correlate with both mRNA stability using a massive reporter library and expression levels using fluorescent reporters and analysis of endogenous gene expression in zebrafish embryos and/or human cells. Therefore, iCodon will benefit basic biological research, as well as a wide range of applications for biotechnology and biomedicine.
Collapse
|
9
|
Mitotic checkpoint gene expression is tuned by codon usage bias. EMBO J 2022; 41:e107896. [PMID: 35811551 PMCID: PMC9340482 DOI: 10.15252/embj.2021107896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 05/30/2022] [Accepted: 06/06/2022] [Indexed: 11/09/2022] Open
Abstract
The mitotic checkpoint (also called spindle assembly checkpoint, SAC) is a signaling pathway that safeguards proper chromosome segregation. Correct functioning of the SAC depends on adequate protein concentrations and appropriate stoichiometries between SAC proteins. Yet very little is known about the regulation of SAC gene expression. Here, we show in the fission yeast Schizosaccharomyces pombe that a combination of short mRNA half-lives and long protein half-lives supports stable SAC protein levels. For the SAC genes mad2+ and mad3+ , their short mRNA half-lives are caused, in part, by a high frequency of nonoptimal codons. In contrast, mad1+ mRNA has a short half-life despite a higher frequency of optimal codons, and despite the lack of known RNA-destabilizing motifs. Hence, different SAC genes employ different strategies of expression. We further show that Mad1 homodimers form co-translationally, which may necessitate a certain codon usage pattern. Taken together, we propose that the codon usage of SAC genes is fine-tuned to ensure proper SAC function. Our work shines light on gene expression features that promote spindle assembly checkpoint function and suggests that synonymous mutations may weaken the checkpoint.
Collapse
|
10
|
Genetic variants associated mRNA stability in lung. BMC Genomics 2022; 23:196. [PMID: 35272635 PMCID: PMC8915503 DOI: 10.1186/s12864-022-08405-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Accepted: 02/21/2022] [Indexed: 12/04/2022] Open
Abstract
Background Expression quantitative trait loci (eQTLs) analyses have been widely used to identify genetic variants associated with gene expression levels to understand what molecular mechanisms underlie genetic traits. The resultant eQTLs might affect the expression of associated genes through transcriptional or post-transcriptional regulation. In this study, we attempt to distinguish these two types of regulation by identifying genetic variants associated with mRNA stability of genes (stQTLs). Results Here, we presented a computational framework that takes advantage of recently developed methods to infer the mRNA stability of genes based on RNA-seq data and performed association analysis to identify stQTLs. Using the Genotype-Tissue Expression (GTEx) lung RNA-Seq data, we identified a total of 142,801 stQTLs for 3942 genes and 186,132 eQTLs for 4751 genes from 15,122,700 genetic variants for 13,476 genes on the autosomes, respectively. Interestingly, our results indicated that stQTLs were enriched in the CDS and 3’UTR regions, while eQTLs are enriched in the CDS, 3’UTR, 5’UTR, and upstream regions. We also found that stQTLs are more likely than eQTLs to overlap with RNA binding protein (RBP) and microRNA (miRNA) binding sites. Our analyses demonstrate that simultaneous identification of stQTLs and eQTLs can provide more mechanistic insight on the association between genetic variants and gene expression levels. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08405-y.
Collapse
|
11
|
Gene age shapes the transcriptional landscape of sexual morphogenesis in mushroom forming fungi (Agaricomycetes). eLife 2022; 11:71348. [PMID: 35156613 PMCID: PMC8893723 DOI: 10.7554/elife.71348] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 02/11/2022] [Indexed: 11/13/2022] Open
Abstract
Multicellularity has been one of the most important innovations in the history of life. The role of gene regulatory changes in driving transitions to multicellularity is being increasingly recognized; however, factors influencing gene expression patterns are poorly known in many clades. Here, we compared the developmental transcriptomes of complex multicellular fruiting bodies of eight Agaricomycetes and Cryptococcus neoformans, a closely related human pathogen with a simple morphology. In-depth analysis in Pleurotus ostreatus revealed that allele-specific expression, natural antisense transcripts, and developmental gene expression, but not RNA editing or a ‘developmental hourglass,’ act in concert to shape its transcriptome during fruiting body development. We found that transcriptional patterns of genes strongly depend on their evolutionary ages. Young genes showed more developmental and allele-specific expression variation, possibly because of weaker evolutionary constraint, suggestive of nonadaptive expression variance in fruiting bodies. These results prompted us to define a set of conserved genes specifically regulated only during complex morphogenesis by excluding young genes and accounting for deeply conserved ones shared with species showing simple sexual development. Analysis of the resulting gene set revealed evolutionary and functional associations with complex multicellularity, which allowed us to speculate they are involved in complex multicellular morphogenesis of mushroom fruiting bodies.
Collapse
|
12
|
Modulation of miRISC-Mediated Gene Silencing in Eukaryotes. Front Mol Biosci 2022; 9:832916. [PMID: 35237661 PMCID: PMC8882679 DOI: 10.3389/fmolb.2022.832916] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 01/18/2022] [Indexed: 11/13/2022] Open
Abstract
Gene expression is regulated at multiple levels in eukaryotic cells. Regulation at the post-transcriptional level is modulated by various trans-acting factors that bind to specific sequences in the messenger RNA (mRNA). The binding of different trans factors influences various aspects of the mRNA such as degradation rate, translation efficiency, splicing, localization, etc. MicroRNAs (miRNAs) are short endogenous ncRNAs that combine with the Argonaute to form the microRNA-induced silencing complex (miRISC), which uses base-pair complementation to silence the target transcript. RNA-binding proteins (RBPs) contribute to post-transcriptional control by influencing the mRNA stability and translation upon binding to cis-elements within the mRNA transcript. RBPs have been shown to impact gene expression through influencing the miRISC biogenesis, composition, or miRISC-mRNA target interaction. While there is clear evidence that those interactions between RBPs, miRNAs, miRISC and target mRNAs influence the efficiency of miRISC-mediated gene silencing, the exact mechanism for most of them remains unclear. This review summarizes our current knowledge on gene expression regulation through interactions of miRNAs and RBPs.
Collapse
|
13
|
Roles of mRNA poly(A) tails in regulation of eukaryotic gene expression. Nat Rev Mol Cell Biol 2022; 23:93-106. [PMID: 34594027 PMCID: PMC7614307 DOI: 10.1038/s41580-021-00417-y] [Citation(s) in RCA: 139] [Impact Index Per Article: 69.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/16/2021] [Indexed: 02/06/2023]
Abstract
In eukaryotes, poly(A) tails are present on almost every mRNA. Early experiments led to the hypothesis that poly(A) tails and the cytoplasmic polyadenylate-binding protein (PABPC) promote translation and prevent mRNA degradation, but the details remained unclear. More recent data suggest that the role of poly(A) tails is much more complex: poly(A)-binding protein can stimulate poly(A) tail removal (deadenylation) and the poly(A) tails of stable, highly translated mRNAs at steady state are much shorter than expected. Furthermore, the rate of translation elongation affects deadenylation. Consequently, the interplay between poly(A) tails, PABPC, translation and mRNA decay has a major role in gene regulation. In this Review, we discuss recent work that is revolutionizing our understanding of the roles of poly(A) tails in the cytoplasm. Specifically, we discuss the roles of poly(A) tails in translation and control of mRNA stability and how poly(A) tails are removed by exonucleases (deadenylases), including CCR4-NOT and PAN2-PAN3. We also discuss how deadenylation rate is determined, the integration of deadenylation with other cellular processes and the function of PABPC. We conclude with an outlook for the future of research in this field.
Collapse
|
14
|
Effects of sequence motifs in the yeast 3' untranslated region determined from massively parallel assays of random sequences. Genome Biol 2021; 22:293. [PMID: 34663436 PMCID: PMC8522215 DOI: 10.1186/s13059-021-02509-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 09/30/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The 3' untranslated region (UTR) plays critical roles in determining the level of gene expression through effects on activities such as mRNA stability and translation. Functional elements within this region have largely been identified through analyses of native genes, which contain multiple co-evolved sequence features. RESULTS To explore the effects of 3' UTR sequence elements outside of native sequence contexts, we analyze hundreds of thousands of random 50-mers inserted into the 3' UTR of a reporter gene in the yeast Saccharomyces cerevisiae. We determine relative protein expression levels from the fitness of transformants in a growth selection. We find that the consensus 3' UTR efficiency element significantly boosts expression, independent of sequence context; on the other hand, the consensus positioning element has only a small effect on expression. Some sequence motifs that are binding sites for Puf proteins substantially increase expression in the library, despite these proteins generally being associated with post-transcriptional downregulation of native mRNAs. Our measurements also allow a systematic examination of the effects of point mutations within efficiency element motifs across diverse sequence backgrounds. These mutational scans reveal the relative in vivo importance of individual bases in the efficiency element, which likely reflects their roles in binding the Hrp1 protein involved in cleavage and polyadenylation. CONCLUSIONS The regulatory effects of some 3' UTR sequence features, like the efficiency element, are consistent regardless of sequence context. In contrast, the consequences of other 3' UTR features appear to be strongly dependent on their evolved context within native genes.
Collapse
|
15
|
|
16
|
Abstract
Evolution of cis-regulatory sequences depends on how they affect gene expression and motivates both the identification and prediction of cis-regulatory variants responsible for expression differences within and between species. While much progress has been made in relating cis-regulatory variants to expression levels, the timing of gene activation and repression may also be important to the evolution of cis-regulatory sequences. We investigated allele-specific expression (ASE) dynamics within and between Saccharomyces species during the diauxic shift and found appreciable cis-acting variation in gene expression dynamics. Within-species ASE is associated with intergenic variants, and ASE dynamics are more strongly associated with insertions and deletions than ASE levels. To refine these associations, we used a high-throughput reporter assay to test promoter regions and individual variants. Within the subset of regions that recapitulated endogenous expression, we identified and characterized cis-regulatory variants that affect expression dynamics. Between species, chimeric promoter regions generate novel patterns and indicate constraints on the evolution of gene expression dynamics. We conclude that changes in cis-regulatory sequences can tune gene expression dynamics and that the interplay between expression dynamics and other aspects of expression is relevant to the evolution of cis-regulatory sequences.
Collapse
|
17
|
Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana. BMC Bioinformatics 2021; 22:380. [PMID: 34294042 PMCID: PMC8299621 DOI: 10.1186/s12859-021-04291-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 07/07/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA degradation is important for the regulation of gene expression. Despite the identification of proteins and sequences related to deadenylation-dependent RNA degradation in plants, endonucleolytic cleavage-dependent RNA degradation has not been studied in detail. Here, we developed truncated RNA end sequencing in Arabidopsis thaliana to identify cleavage sites and evaluate the efficiency of cleavage at each site. Although several features are related to RNA cleavage efficiency, the effect of each feature on cleavage efficiency has not been evaluated by considering multiple putative determinants in A. thaliana. RESULTS Cleavage site information was acquired from a previous study, and cleavage efficiency at the site level (CSsite value), which indicates the number of reads at each cleavage site normalized to RNA abundance, was calculated. To identify features related to cleavage efficiency at the site level, multiple putative determinants (features) were used to perform feature selection using the Least Absolute Shrinkage and Selection Operator (LASSO) regression model. The results indicated that whole RNA features were important for the CSsite value, in addition to features around cleavage sites. Whole RNA features related to the translation process and nucleotide frequency around cleavage sites were major determinants of cleavage efficiency. The results were verified in a model constructed using only sequence features, which showed that the prediction accuracy was similar to that determined using all features including the translation process, suggesting that cleavage efficiency can be predicted using only sequence information. The LASSO regression model was validated in exogenous genes, which showed that the model constructed using only sequence information can predict cleavage efficiency in both endogenous and exogenous genes. CONCLUSIONS Feature selection using the LASSO regression model in A. thaliana identified 155 features. Correlation coefficients revealed that whole RNA features are important for determining cleavage efficiency in addition to features around the cleavage sites. The LASSO regression model can predict cleavage efficiency in endogenous and exogenous genes using only sequence information. The model revealed the significance of the effect of multiple determinants on cleavage efficiency, suggesting that sequence features are important for RNA degradation mechanisms in A. thaliana.
Collapse
|
18
|
Performance of Regression Models as a Function of Experiment Noise. Bioinform Biol Insights 2021; 15:11779322211020315. [PMID: 34262264 PMCID: PMC8243133 DOI: 10.1177/11779322211020315] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/29/2021] [Indexed: 11/21/2022] Open
Abstract
Background: A challenge in developing machine learning regression models is that it is
difficult to know whether maximal performance has been reached on the test
dataset, or whether further model improvement is possible. In biology, this
problem is particularly pronounced as sample labels (response variables) are
typically obtained through experiments and therefore have experiment noise
associated with them. Such label noise puts a fundamental limit to the
metrics of performance attainable by regression models on the test
dataset. Results: We address this challenge by deriving an expected upper bound for the
coefficient of determination (R2) for regression
models when tested on the holdout dataset. This upper bound depends only on
the noise associated with the response variable in a dataset as well as its
variance. The upper bound estimate was validated via Monte Carlo simulations
and then used as a tool to bootstrap performance of regression models
trained on biological datasets, including protein sequence data,
transcriptomic data, and genomic data. Conclusions: The new method for estimating upper bounds for model performance on test data
should aid researchers in developing ML regression models that reach their
maximum potential. Although we study biological datasets in this work, the
new upper bound estimates will hold true for regression models from any
research field or application area where response variables have associated
noise.
Collapse
|
19
|
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
|
20
|
Genome-wide analysis of lncRNA stability in human. PLoS Comput Biol 2021; 17:e1008918. [PMID: 33861746 PMCID: PMC8081339 DOI: 10.1371/journal.pcbi.1008918] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 04/28/2021] [Accepted: 03/26/2021] [Indexed: 12/18/2022] Open
Abstract
Transcript stability is associated with many biological processes, and the factors affecting mRNA stability have been extensively studied. However, little is known about the features related to human long noncoding RNA (lncRNA) stability. By inhibiting transcription and collecting samples in 10 time points, genome-wide RNA-seq studies was performed in human lung adenocarcinoma cells (A549) and RNA half-life datasets were constructed. The following observations were obtained. First, the half-life distributions of both lncRNAs and messanger RNAs (mRNAs) with one exon (lnc-human1 and m-human1) were significantly different from those of both lncRNAs and mRNAs with more than one exon (lnc-human2 and m-human2). Furthermore, some factors such as full-length transcript secondary structures played a contrary role in lnc-human1 and m-human2. Second, through the half-life comparisons of nucleus- and cytoplasm-specific and common lncRNAs and mRNAs, lncRNAs (mRNAs) in the nucleus were found to be less stable than those in the cytoplasm, which was derived from transcripts themselves rather than cellular location. Third, kmers-based protein−RNA or RNA−RNA interactions promoted lncRNA stability from lnc-human1 and decreased mRNA stability from m-human2 with high probability. Finally, through applying deep learning−based regression, a non-linear relationship was found to exist between the half-lives of lncRNAs (mRNAs) and related factors. The present study established lncRNA and mRNA half-life regulation networks in the A549 cell line and shed new light on the degradation behaviors of both lncRNAs and mRNAs. Transcript stability is important for many biological processes. However, little is known about the features related to human lncRNA stability. Through quantitative analysis between the half-lives of lncRNAs (mRNAs) and various factors, we found a nonlinear relationship between the half-lives of lncRNAs (mRNAs) and the related factors and their combinations. Our research provided a comprehensive understanding of lncRNA stability. Further efforts are needed to develop an accurate quantitative prediction model for the half-lives of lncRNA (mRNA).
Collapse
|
21
|
Crosstalk between codon optimality and cis-regulatory elements dictates mRNA stability. Genome Biol 2021; 22:14. [PMID: 33402205 PMCID: PMC7783504 DOI: 10.1186/s13059-020-02251-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 12/17/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND The regulation of messenger RNA (mRNA) stability has a profound impact on gene expression dynamics during embryogenesis. For example, in animals, maternally deposited mRNAs are degraded after fertilization to enable new developmental trajectories. Regulatory sequences in 3' untranslated regions (3'UTRs) have long been considered the central determinants of mRNA stability. However, recent work indicates that the coding sequence also possesses regulatory information. Specifically, translation in cis impacts mRNA stability in a codon-dependent manner. However, the strength of this mechanism during embryogenesis, as well as its relationship with other known regulatory elements, such as microRNA, remains unclear. RESULTS Here, we show that codon composition is a major predictor of mRNA stability in the early embryo. We show that this mechanism works in combination with other cis-regulatory elements to dictate mRNA stability in zebrafish and Xenopus embryos as well as in mouse and human cells. Furthermore, we show that microRNA targeting efficacy can be affected by substantial enrichment of optimal (stabilizing) or non-optimal (destabilizing) codons. Lastly, we find that one microRNA, miR-430, antagonizes the stabilizing effect of optimal codons during early embryogenesis in zebrafish. CONCLUSIONS By integrating the contributions of different regulatory mechanisms, our work provides a framework for understanding how combinatorial control of mRNA stability shapes the gene expression landscape.
Collapse
|
22
|
Genome-scale molecular principles of mRNA half-life regulation in yeast. FEBS J 2020; 288:3428-3447. [PMID: 33319437 DOI: 10.1111/febs.15670] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 11/07/2020] [Accepted: 12/11/2020] [Indexed: 12/22/2022]
Abstract
Precise control of protein and messenger RNA (mRNA) degradation is essential for cellular metabolism and homeostasis. Controlled and specific degradation of both molecular species necessitates their engagements with the respective degradation machineries; this engagement involves a disordered/unstructured segment of the substrate traversing the degradation tunnel of the machinery and accessing the catalytic sites. However, while molecular factors influencing protein degradation have been extensively explored on a genome scale, and in multiple organisms, such a comprehensive understanding remains missing for mRNAs. Here, we analyzed multiple genome-scale experimental yeast mRNA half-life data in light of experimentally derived mRNA secondary structures and protein binding data, along with high-resolution X-ray crystallographic structures of the RNase machines. Results unraveled a consistent genome-scale trend that mRNAs comprising longer terminal and/or internal unstructured segments have significantly shorter half-lives; the lengths of the 5'-terminal, 3'-terminal, and internal unstructured segments that affect mRNA half-life are compatible with molecular structures of the 5' exo-, 3' exo-, and endoribonuclease machineries. Sequestration into ribonucleoprotein complexes elongates mRNA half-life, presumably by burying ribonuclease engagement sites under oligomeric interfaces. After gene duplication, differences in terminal unstructured lengths, proportions of internal unstructured segments, and oligomerization modes result in significantly altered half-lives of paralogous mRNAs. Side-by-side comparison of molecular principles underlying controlled protein and mRNA degradation in yeast unravels their remarkable mechanistic similarities and suggests how the intrinsic structural features of the two molecular species, at two different levels of the central dogma, regulate their half-lives on genome scale.
Collapse
|
23
|
Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat Commun 2020; 11:6141. [PMID: 33262328 PMCID: PMC7708451 DOI: 10.1038/s41467-020-19921-4] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 11/02/2020] [Indexed: 12/31/2022] Open
Abstract
Understanding the genetic regulatory code governing gene expression is an important challenge in molecular biology. However, how individual coding and non-coding regions of the gene regulatory structure interact and contribute to mRNA expression levels remains unclear. Here we apply deep learning on over 20,000 mRNA datasets to examine the genetic regulatory code controlling mRNA abundance in 7 model organisms ranging from bacteria to Human. In all organisms, we can predict mRNA abundance directly from DNA sequence, with up to 82% of the variation of transcript levels encoded in the gene regulatory structure. By searching for DNA regulatory motifs across the gene regulatory structure, we discover that motif interactions could explain the whole dynamic range of mRNA levels. Co-evolution across coding and non-coding regions suggests that it is not single motifs or regions, but the entire gene regulatory structure and specific combination of regulatory elements that define gene expression levels.
Collapse
|
24
|
Computational discovery and modeling of novel gene expression rules encoded in the mRNA. Biochem Soc Trans 2020; 48:1519-1528. [PMID: 32662820 DOI: 10.1042/bst20191048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 06/15/2020] [Accepted: 06/17/2020] [Indexed: 11/17/2022]
Abstract
The transcript is populated with numerous overlapping codes that regulate all steps of gene expression. Deciphering these codes is very challenging due to the large number of variables involved, the non-modular nature of the codes, biases and limitations in current experimental approaches, our limited knowledge in gene expression regulation across the tree of life, and other factors. In recent years, it has been shown that computational modeling and algorithms can significantly accelerate the discovery of novel gene expression codes. Here, we briefly summarize the latest developments and different approaches in the field.
Collapse
|
25
|
Fated for decay: RNA elements targeted by viral endonucleases. Semin Cell Dev Biol 2020; 111:119-125. [PMID: 32522410 PMCID: PMC7276228 DOI: 10.1016/j.semcdb.2020.05.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 04/28/2020] [Accepted: 05/13/2020] [Indexed: 11/22/2022]
Abstract
For over a decade, studies of messenger RNA regulation have revealed an unprecedented level of connectivity between the RNA pool and global gene expression. These connections are underpinned by a vast array of RNA elements that coordinate RNA-protein and RNA-RNA interactions, each directing mRNA fate from transcription to translation. Consequently, viruses have evolved an arsenal of strategies to target these RNA features and ultimately take control of the pathways they influence, and these strategies contribute to the global shutdown of the host gene expression machinery known as “Host Shutoff”. This takeover of the host cell is mechanistically orchestrated by a number of non-homologous virally encoded endoribonucleases. Recent large-scale screens estimate that over 70 % of the host transcriptome is decimated by the expression of these viral nucleases. While this takeover strategy seems extraordinarily well conserved, each viral endonuclease has evolved to target distinct mRNA elements. Herein, we will explore each of these RNA structures/sequence features that render messenger RNA susceptible or resistant to viral endonuclease cleavage. By further understanding these targeting and escape mechanisms we will continue to unravel untold depths of cellular RNA regulation that further underscores the integral relationship between RNA fate and the fate of the cell.
Collapse
|
26
|
Genome-wide analysis reveals a switch in the translational program upon oocyte meiotic resumption. Nucleic Acids Res 2020; 48:3257-3276. [PMID: 31970406 PMCID: PMC7102970 DOI: 10.1093/nar/gkaa010] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 12/27/2019] [Accepted: 01/03/2020] [Indexed: 12/20/2022] Open
Abstract
During oocyte maturation, changes in gene expression depend exclusively on translation and degradation of maternal mRNAs rather than transcription. Execution of this translation program is essential for assembling the molecular machinery required for meiotic progression, fertilization, and embryo development. With the present study, we used a RiboTag/RNA-Seq approach to explore the timing of maternal mRNA translation in quiescent oocytes as well as in oocytes progressing through the first meiotic division. This genome-wide analysis reveals a global switch in maternal mRNA translation coinciding with oocyte re-entry into the meiotic cell cycle. Messenger RNAs whose translation is highly active in quiescent oocytes invariably become repressed during meiotic re-entry, whereas transcripts repressed in quiescent oocytes become activated. Experimentally, we have defined the exact timing of the switch and the repressive function of CPE elements, and identified a novel role for CPEB1 in maintaining constitutive translation of a large group of maternal mRNAs during maturation.
Collapse
|
27
|
Role of ANKHD1/LINC00346/ZNF655 Feedback Loop in Regulating the Glioma Angiogenesis via Staufen1-Mediated mRNA Decay. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 20:866-878. [PMID: 32464549 PMCID: PMC7256448 DOI: 10.1016/j.omtn.2020.05.004] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Revised: 05/05/2020] [Accepted: 05/07/2020] [Indexed: 12/11/2022]
Abstract
Accumulating evidence shows that long noncoding RNA (lncRNA) dysregulation plays a critical role in tumor angiogenesis. Glioma is characterized by abundant angiogenesis. Herein, we investigated the expression and function of LINC00346 in the regulation of glioma angiogenesis. The present study first demonstrated that ANKHD1 (ankyrin repeat and KH domain-containing protein 1) and LINC00346 were significantly increased in glioma-associated endothelial cells (GECs), whereas ZNF655 (zinc finger protein 655) was decreased in GECs. Meanwhile, ANKHD1 inhibition, LINC00346 inhibition, or ZNF655 overexpression impeded angiogenesis of GECs. Moreover, ANKHD1 targeted LINC00346 and enhanced the stability of LINC00346. In addition, LINC00346 bound to ZNF655 mRNA through their Alu elements so that LINC00346 facilitated the degradation of ZNF655 mRNA via a STAU1 (Staufen1)-mediated mRNA decay (SMD) mechanism. Futhermore, ZNF655 targeted the promoter region of ANKHD1 and formed an ANKHD1/LINC00346/ZNF655 feedback loop that regulated glioma angiogenesis. Finally, knockdown of ANKHD1 and LINC00346, combined with overexpression of ZNF655, resulted in a significant decrease in new vessels and hemoglobin content in vivo. The results identified an ANKHD1/LINC00346/ZNF655 feedback loop in the regulation of glioma angiogenesis that may provide new targets and strategies for targeted therapy against glioma.
Collapse
|
28
|
A selective Aurora-A 5'-UTR siRNA inhibits tumor growth and metastasis. Cancer Lett 2019; 472:97-107. [PMID: 31875524 DOI: 10.1016/j.canlet.2019.12.031] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 12/13/2019] [Accepted: 12/18/2019] [Indexed: 01/08/2023]
Abstract
Many Aurora-A inhibitors have been developed for cancer therapy; however, the specificity and safety of Aurora-A inhibitors remain uncertain. The Aurora-A mRNA yields nine different 5'-UTR isoforms, which result from mRNA alternative splicing. Interestingly, we found that the exon 2-containing Aurora-A mRNA isoforms are predominantly expressed in cancer cell lines as well as human colorectal cancer tissues, making the Aurora-A mRNA exon 2 a promising treatment target in Aurora-A-overexpressing cancers. In this study, a selective siRNA, siRNA-2, which targets Aurora-A mRNA exon 2, was designed to translationally inhibit the expression of Aurora-A in cancer cells but not normal cells; locked nucleic acid (LNA)-modified siRNA-2 showed improved efficacy in inhibiting Aurora-A mRNA translation and tumor growth. Xenograft animal models combined with noninvasion in vivo imaging system (IVIS) analysis further confirmed the anticancer effect of LNA-siRNA-2 with improved efficiency and safety and reduced side effects. Mice orthotopically injected with colorectal cancer cells, LNA-siRNA-2 treatment not only inhibited the tumor growth but also blocked liver and lung metastasis. The results of our study suggest that LNA-siRNA-2 has the potential to be a novel therapeutic agent for cancer treatment.
Collapse
|
29
|
Specific growth rate governs AOX1 gene expression, affecting the production kinetics of Pichia pastoris (Komagataella phaffii) P AOX1-driven recombinant producer strains with different target gene dosage. Microb Cell Fact 2019; 18:187. [PMID: 31675969 PMCID: PMC6824138 DOI: 10.1186/s12934-019-1240-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 10/23/2019] [Indexed: 12/02/2022] Open
Abstract
Background The PAOX1-based expression system is the most widely used for producing recombinant proteins in the methylotrophic yeast Pichia pastoris (Komagataella phaffii). Despite relevant recent advances in regulation of the methanol utilization (MUT) pathway have been made, the role of specific growth rate (µ) in AOX1 regulation remains unknown, and therefore, its impact on protein production kinetics is still unclear. Results The influence of heterologous gene dosage, and both, operational mode and strategy, on culture physiological state was studied by cultivating the two PAOX1-driven Candida rugosa lipase 1 (Crl1) producer clones. Specifically, a clone integrating a single expression cassette of CRL1 was compared with one containing three cassettes over broad dilution rate and µ ranges in both chemostat and fed-batch cultivations. Chemostat cultivations allowed to establish the impact of µ on the MUT-related MIT1 pool which leads to a bell-shaped relationship between µ and PAOX1-driven gene expression, influencing directly Crl1 production kinetics. Also, chemostat and fed-batch cultivations exposed the favorable effects of increasing the CRL1 gene dosage (up to 2.4 fold in qp) on Crl1 production with no significant detrimental effects on physiological capabilities. Conclusions PAOX1-driven gene expression and Crl1 production kinetics in P. pastoris were successfully correlated with µ. In fact, µ governs MUT-related MIT1 amount that triggers PAOX1-driven gene expression—heterologous genes included—, thus directly influencing the production kinetics of recombinant protein.
Collapse
|
30
|
Codon stabilization coefficient as a metric to gain insights into mRNA stability and codon bias and their relationships with translation. Nucleic Acids Res 2019; 47:2216-2228. [PMID: 30698781 PMCID: PMC6412131 DOI: 10.1093/nar/gkz033] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 12/05/2018] [Accepted: 01/14/2019] [Indexed: 12/11/2022] Open
Abstract
The codon stabilization coefficient (CSC) is derived from the correlation between each codon frequency in transcripts and mRNA half-life experimental data. In this work, we used this metric as a reference to compare previously published Saccharomyces cerevisiae mRNA half-life datasets and investigate how codon composition related to protein levels. We generated CSCs derived from nine studies. Four datasets produced similar CSCs, which also correlated with other independent parameters that reflected codon optimality, such as the tRNA abundance and ribosome residence time. By calculating the average CSC for each gene, we found that most mRNAs tended to have more non-optimal codons. Conversely, a high proportion of optimal codons was found for genes coding highly abundant proteins, including proteins that were only transiently overexpressed in response to stress conditions. We also used CSCs to identify and locate mRNA regions enriched in non-optimal codons. We found that these stretches were usually located close to the initiation codon and were sufficient to slow ribosome movement. However, in contrast to observations from reporter systems, we found no position-dependent effect on the mRNA half-life. These analyses underscore the value of CSCs in studies of mRNA stability and codon bias and their relationships with protein expression.
Collapse
|
31
|
Codon bias confers stability to human mRNAs. EMBO Rep 2019; 20:e48220. [PMID: 31482640 DOI: 10.15252/embr.201948220] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 08/08/2019] [Accepted: 08/19/2019] [Indexed: 11/09/2022] Open
Abstract
Codon bias has been implicated as one of the major factors contributing to mRNA stability in several model organisms. However, the molecular mechanisms of codon bias on mRNA stability remain unclear in humans. Here, we show that human cells possess a mechanism to modulate RNA stability through a unique codon bias. Bioinformatics analysis showed that codons could be clustered into two distinct groups-codons with G or C at the third base position (GC3) and codons with either A or T at the third base position (AT3): the former stabilizing while the latter destabilizing mRNA. Quantification of codon bias showed that increased GC3-content entails proportionately higher GC-content. Through bioinformatics, ribosome profiling, and in vitro analysis, we show that decoupling the effects of codon bias reveals two modes of mRNA regulation, one GC3- and one GC-content dependent. Employing an immunoprecipitation-based strategy, we identify ILF2 and ILF3 as RNA-binding proteins that differentially regulate global mRNA abundances based on codon bias. Our results demonstrate that codon bias is a two-pronged system that governs mRNA abundance.
Collapse
|
32
|
Single-cell kinetics of siRNA-mediated mRNA degradation. NANOMEDICINE-NANOTECHNOLOGY BIOLOGY AND MEDICINE 2019; 21:102077. [PMID: 31400572 DOI: 10.1016/j.nano.2019.102077] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Revised: 07/26/2019] [Accepted: 07/29/2019] [Indexed: 12/26/2022]
Abstract
RNA interference (RNAi) enables the therapeutic use of small interfering RNAs (siRNAs) to silence disease-related genes. The efficiency of silencing is commonly assessed by measuring expression levels of the target protein at a given time point post-transfection. Here, we determine the siRNA-induced fold change in mRNA degradation kinetics from single-cell fluorescence time-courses obtained using live-cell imaging on single-cell arrays (LISCA). After simultaneous transfection of mRNAs encoding eGFP (target) and CayRFP (reference), the eGFP expression is silenced by siRNA. The single-cell time-courses are fitted using a mathematical model of gene expression. Analysis yields best estimates of related kinetic rate constants, including mRNA degradation constants. We determine the siRNA-induced changes in kinetic rates and their correlations between target and reference protein expression. Assessment of mRNA degradation constants using single-cell time-lapse imaging is fast (<30 h) and returns an accurate, time-independent measure of siRNA-induced silencing, thus allowing the exact evaluation of siRNA therapeutics.
Collapse
|
33
|
mRNA Deadenylation Is Coupled to Translation Rates by the Differential Activities of Ccr4-Not Nucleases. Mol Cell 2019; 70:1089-1100.e8. [PMID: 29932902 PMCID: PMC6024076 DOI: 10.1016/j.molcel.2018.05.033] [Citation(s) in RCA: 137] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 05/17/2018] [Accepted: 05/24/2018] [Indexed: 01/01/2023]
Abstract
Translation and decay of eukaryotic mRNAs is controlled by shortening of the poly(A) tail and release of the poly(A)-binding protein Pab1/PABP. The Ccr4-Not complex contains two exonucleases—Ccr4 and Caf1/Pop2—that mediate mRNA deadenylation. Here, using a fully reconstituted biochemical system with proteins from the fission yeast Schizosaccharomyces pombe, we show that Pab1 interacts with Ccr4-Not, stimulates deadenylation, and differentiates the roles of the nuclease enzymes. Surprisingly, Pab1 release relies on Ccr4 activity. In agreement with this, in vivo experiments in budding yeast show that Ccr4 is a general deadenylase that acts on all mRNAs. In contrast, Caf1 only trims poly(A) not bound by Pab1. As a consequence, Caf1 is a specialized deadenylase required for the selective deadenylation of transcripts with lower rates of translation elongation and reduced Pab1 occupancy. These findings reveal a coupling between the rates of translation and deadenylation that is dependent on Pab1 and Ccr4-Not. Poly(A)-binding protein is efficiently released by Ccr4-Not nuclease activity Ccr4, but not Caf1, removes poly(A) tails bound to Pab1 Ccr4 acts on all transcripts and Caf1 acts on transcripts with low codon optimality Deadenylation by Ccr4-Not connects translation with mRNA stability
Collapse
|
34
|
The Correlation Between DsRed mRNA Levels and Transient DsRed Protein Expression in Plants Depends on Leaf Age and the 5' Untranslated Region. Biotechnol J 2019; 14:e1800075. [PMID: 29701331 DOI: 10.1002/biot.201800075] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Revised: 04/18/2018] [Indexed: 11/08/2022]
Abstract
The yield of recombinant proteins in plants determines their economic competitiveness as a production platform compared to microbes and mammalian cells. The promoter, untranslated regions (UTRs) and codon usage can all contribute to the yield, but potential interactions among these components have not been examined in detail. Here the effect of two promoters (35SS and nos) and four 5'UTRs on the spatiotemporal expression of DsRed mRNA and the accumulation of DsRed protein during transient expression in tobacco (Nicotiana tabacum) mediated by Agrobacterium tumefaciens is investigated. The authors found that the mRNA levels peaked 2-3 days post-infiltration (dpi), and rapidly declined thereafter, whereas DsRed protein was first detected after ≈3 days and concentrations continued to increase until at least 5 dpi. This temporal decoupling of mRNA and protein expression was strongest in the older leaves, which also produced the lowest DsRed yields. The accumulation of DsRed linearly correlated with mRNA levels in all but the youngest leaves, where more DsRed was synthesized per mRNA molecule. This was the case for both promoters, although the nos promoter had a higher protein/mRNA ratio than the 35SS promoter. Furthermore, the type of 5'UTR affected DsRed protein accumulation by 50% starting from similar levels of mRNA. The authors concluded that DsRed mRNA levels are not the limiting factor for DsRed protein expression in plants, but that translation-associated processes such as initiation, elongation, and release are bottlenecks that should be addressed in future studies.
Collapse
|
35
|
MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol 2019; 20:48. [PMID: 30823901 PMCID: PMC6396468 DOI: 10.1186/s13059-019-1653-z] [Citation(s) in RCA: 107] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 02/12/2019] [Indexed: 12/15/2022] Open
Abstract
Predicting the effects of genetic variants on splicing is highly relevant for human genetics. We describe the framework MMSplice (modular modeling of splicing) with which we built the winning model of the CAGI5 exon skipping prediction challenge. The MMSplice modules are neural networks scoring exon, intron, and splice sites, trained on distinct large-scale genomics datasets. These modules are combined to predict effects of variants on exon skipping, splice site choice, splicing efficiency, and pathogenicity, with matched or higher performance than state-of-the-art. Our models, available in the repository Kipoi, apply to variants including indels directly from VCF files.
Collapse
|
36
|
The codon sequences predict protein lifetimes and other parameters of the protein life cycle in the mouse brain. Sci Rep 2018; 8:16913. [PMID: 30443017 PMCID: PMC6237891 DOI: 10.1038/s41598-018-35277-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Accepted: 11/02/2018] [Indexed: 12/14/2022] Open
Abstract
The homeostasis of the proteome depends on the tight regulation of the mRNA and protein abundances, of the translation rates, and of the protein lifetimes. Results from several studies on prokaryotes or eukaryotic cell cultures have suggested that protein homeostasis is connected to, and perhaps regulated by, the protein and the codon sequences. However, this has been little investigated for mammals in vivo. Moreover, the link between the coding sequences and one critical parameter, the protein lifetime, has remained largely unexplored, both in vivo and in vitro. We tested this in the mouse brain, and found that the percentages of amino acids and codons in the sequences could predict all of the homeostasis parameters with a precision approaching experimental measurements. A key predictive element was the wobble nucleotide. G-/C-ending codons correlated with higher protein lifetimes, protein abundances, mRNA abundances and translation rates than A-/U-ending codons. Modifying the proportions of G-/C-ending codons could tune these parameters in cell cultures, in a proof-of-principle experiment. We suggest that the coding sequences are strongly linked to protein homeostasis in vivo, albeit it still remains to be determined whether this relation is causal in nature.
Collapse
|
37
|
Translation elongation and mRNA stability are coupled through the ribosomal A-site. RNA (NEW YORK, N.Y.) 2018; 24:1377-1389. [PMID: 29997263 PMCID: PMC6140462 DOI: 10.1261/rna.066787.118] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Accepted: 07/06/2018] [Indexed: 05/21/2023]
Abstract
Messenger RNA (mRNA) degradation plays a critical role in regulating transcript levels in eukaryotic cells. Previous work by us and others has shown that codon identity exerts a powerful influence on mRNA stability. In Saccharomyces cerevisiae, studies using a handful of reporter mRNAs show that optimal codons increase translation elongation rate, which in turn increases mRNA stability. However, a direct relationship between elongation rate and mRNA stability has not been established across the entire yeast transcriptome. In addition, there is evidence from work in higher eukaryotes that amino acid identity influences mRNA stability, raising the question as to whether the impact of translation elongation on mRNA decay is at the level of tRNA decoding, amino acid incorporation, or some combination of each. To address these questions, we performed ribosome profiling of wild-type yeast. In good agreement with other studies, our data showed faster codon-specific elongation over optimal codons and faster transcript-level elongation correlating with transcript optimality. At both the codon-level and transcript-level, faster elongation correlated with increased mRNA stability. These findings were reinforced by showing increased translation efficiency and kinetics for a panel of 11 HIS3 reporter mRNAs of increasing codon optimality. While we did observe that elongation measured by ribosome profiling is composed of both amino acid identity and synonymous codon effects, further analyses of these data establish that A-site tRNA decoding rather than other steps of translation elongation is driving mRNA decay in yeast.
Collapse
|
38
|
Plant mRNA decay: extended roles and potential determinants. CURRENT OPINION IN PLANT BIOLOGY 2018; 45:178-184. [PMID: 30223189 DOI: 10.1016/j.pbi.2018.08.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 08/17/2018] [Accepted: 08/24/2018] [Indexed: 05/19/2023]
Abstract
The decay of mRNA in plants is tightly controlled and shapes the transcriptome. The roles of this process are to digest RNA as well as to suppress exogenous and endogenous gene silencing by preventing siRNA generation. Recent evidence suggests that mRNA decay also regulates the accumulation of the putative 3' fragment-derived long non-coding RNAs (3'lncRNAs). The generation of siRNA or 3'lncRNA from a selective subset of mRNAs raises a fundamental question of how the mRNA decay machineries select and determine their substrate transcripts for distinctive decay destiny. Evidence for potential mRNA decay determinants, such as codon bias, GC content and N6-methyladenosine (m6A) modification, is rapidly emerging. This paper aims to review the recent discoveries in plant mRNA decay.
Collapse
|
39
|
A cis-Acting Element Downstream of the Mouse Mammary Tumor Virus Major Splice Donor Critical for RNA Elongation and Stability. J Mol Biol 2018; 430:4307-4324. [PMID: 30179605 DOI: 10.1016/j.jmb.2018.08.025] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2018] [Revised: 08/28/2018] [Accepted: 08/28/2018] [Indexed: 12/11/2022]
Abstract
BACKGROUND The mouse mammary tumor virus (MMTV) encodes a functional signal peptide, a cleavage product of envelope and Rem proteins. Signal peptide interacts with a 3' cis-acting RNA element, the Rem-responsive element (RmRE), to facilitate expression of both unspliced genomic (gRNA) and spliced mRNAs. An additional RmRE has been proposed at the 5' end of the genome, facilitating nuclear export of the unspliced gRNA, whereas the 3' RmRE could facilitate translation of all other mRNAs, including gRNA. RESULTS To address this hypothesis, a series of mutations were introduced into a 24-nt region found exclusively in the unspliced gRNA. Mutant clones using MMTV or human cytomegalovirus promoters were tested in both transient and stable transfections to determine their effect on gRNA nuclear export, stability, and translation. Nuclear export of the gRNA was affected only in a small mutant subset in stably transfected Jurkat T cells. Quantitative real-time RT-PCR of actinomycin D-treated cells expressing MMTV revealed that multiple mutants were severely compromised for RNA expression and stability. Both genomic and spliced nuclear RNAs were reduced, leading to abrogation of Gag and Env protein expressed from unspliced and spliced mRNAs, respectively. RT-PCRs with multiple primer pairs indicated failure to elongate genomic MMTV transcripts beyond ~500 nt compared to the wild type in a cell line-dependent manner. CONCLUSIONS MMTV contains a novel cis-acting element downstream of the major splice donor critical for facilitating MMTV gRNA elongation and stability. Presence of a mirror repeat within the element may represent important viral/host factor binding site(s) within MMTV gRNA.
Collapse
|
40
|
Improved secretory expression of lignocellulolytic enzymes in Kluyveromyces marxianus by promoter and signal sequence engineering. BIOTECHNOLOGY FOR BIOFUELS 2018; 11:235. [PMID: 30279722 PMCID: PMC6116501 DOI: 10.1186/s13068-018-1232-7] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Accepted: 08/20/2018] [Indexed: 06/05/2023]
Abstract
BACKGROUND Taking into account its thermotolerance, high growth rate, and broad substrate spectrum, Kluyveromyces marxianus can be considered an ideal consolidated bioprocessing (CBP). A major obstacle to ethanol production using K. marxianus is the low production of lignocellulolytic enzymes, which reduces the cellulose hydrolysis and ethanol production. Thus, further improvement of enzyme expression and secretion is essential. RESULTS To improve the expression of lignocellulolytic enzymes, the inulinase promoter and signal sequence from K. marxianus was optimized through mutagenesis. A T(-361)A mutation inside the promoter, a deletion of AT-rich region inside 5'UTR (UTR∆A), and a P10L substitution in the signal sequence increased the secretory expression of the feruloyl esterase Est1E by up to sixfold. T(-361)A and UTR∆A increased the mRNA expression, while the P10L substitution extended the hydrophobic core of signal sequence and promoted secretion of mature protein. P10L and T(-361)A mutations increased secretory expressions of other types of lignocellulolytic enzymes by up to threefold, including endo-1,4-β-glucanase RuCelA, endo-1,4-β-endoxylanase Xyn-CDBFV, and endo-1,4-β-mannanase MAN330. During the fed-batch fermentation of strains carrying optimized modules, the peak activities of RuCelA, Xyn-CDBFV, MAN330, and Est1E reached 24 U/mL, 25,600 U/mL, 10,200 U/mL, and 1220 U/mL, respectively. Importantly, higher yield of enzymes by optimized promoter and signal sequence were achieved in all tested carbon sources, including the major end products of lignocellulose saccharification and fermentation, with growth on xylose resulting in the highest production. CONCLUSIONS The engineered promoter and signal sequence presented increased secretory expressions of different lignocellulolytic enzymes in K. marxianus by means of various carbon resources. Activities of lignocellulolytic enzymes in fed-batch fermentation were the highest activities reported for K. marxianus so far. Our engineered modules are valuable in producing lignocellulolytic enzymes by K. marxianus and in constructing efficient CBP strains for cellulosic ethanol production.
Collapse
|
41
|
Decoupling the impact of microRNAs on translational repression versus RNA degradation in embryonic stem cells. eLife 2018; 7:38014. [PMID: 30044225 PMCID: PMC6086665 DOI: 10.7554/elife.38014] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 07/24/2018] [Indexed: 01/29/2023] Open
Abstract
Translation and mRNA degradation are intimately connected, yet the mechanisms that link them are not fully understood. Here, we studied these mechanisms in embryonic stem cells (ESCs). Transcripts showed a wide range of stabilities, which correlated with their relative translation levels and that did not change during early ESC differentiation. The protein DHH1 links translation to mRNA stability in yeast; however, loss of the mammalian homolog, DDX6, in ESCs did not disrupt the correlation across transcripts. Instead, the loss of DDX6 led to upregulated translation of microRNA targets, without concurrent changes in mRNA stability. The Ddx6 knockout cells were phenotypically and molecularly similar to cells lacking all microRNAs (Dgcr8 knockout ESCs). These data show that the loss of DDX6 can separate the two canonical functions of microRNAs: translational repression and transcript destabilization. Furthermore, these data uncover a central role for translational repression independent of transcript destabilization in defining the downstream consequences of microRNA loss.
Collapse
|