1
|
Wang J, Quan L, Jin Z, Wu H, Ma X, Wang X, Xie J, Pan D, Chen T, Wu T, Lyu Q. MultiModRLBP: A Deep Learning Approach for Multi-Modal RNA-Small Molecule Ligand Binding Sites Prediction. IEEE J Biomed Health Inform 2024; 28:4995-5006. [PMID: 38739505 DOI: 10.1109/jbhi.2024.3400521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
This study aims to tackle the intricate challenge of predicting RNA-small molecule binding sites to explore the potential value in the field of RNA drug targets. To address this challenge, we propose the MultiModRLBP method, which integrates multi-modal features using deep learning algorithms. These features include 3D structural properties at the nucleotide base level of the RNA molecule, relational graphs based on overall RNA structure, and rich RNA semantic information. In our investigation, we gathered 851 interactions between RNA and small molecule ligand from the RNAglib dataset and RLBind training set. Unlike conventional training sets, this collection broadened its scope by including RNA complexes that have the same RNA sequence but change their respective binding sites due to structural differences or the presence of different ligands. This enhancement enables the MultiModRLBP model to more accurately capture subtle changes at the structural level, ultimately improving its ability to discern nuances among similar RNA conformations. Furthermore, we evaluated MultiModRLBP on two classic test sets, Test18 and Test3, highlighting its performance disparities on small molecules based on metal and non-metal ions. Additionally, we conducted a structural sensitivity analysis on specific complex categories, considering RNA instances with varying degrees of structural changes and whether they share the same ligands. The research results indicate that MultiModRLBP outperforms the current state-of-the-art methods on multiple classic test sets, particularly excelling in predicting binding sites for non-metal ions and instances where the binding sites are widely distributed along the sequence. MultiModRLBP also can be used as a potential tool when the RNA structure is perturbed or the RNA experimental tertiary structure is not available. Most importantly, MultiModRLBP exhibits the capability to distinguish binding characteristics of RNA that are structurally diverse yet exhibit sequence similarity. These advancements hold promise in reducing the costs associated with the development of RNA-targeted drugs.
Collapse
|
2
|
Zhang Y, Lang M, Jiang J, Gao Z, Xu F, Litfin T, Chen K, Singh J, Huang X, Song G, Tian Y, Zhan J, Chen J, Zhou Y. Multiple sequence alignment-based RNA language model and its application to structural inference. Nucleic Acids Res 2024; 52:e3. [PMID: 37941140 PMCID: PMC10783488 DOI: 10.1093/nar/gkad1031] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 10/21/2023] [Indexed: 11/10/2023] Open
Abstract
Compared with proteins, DNA and RNA are more difficult languages to interpret because four-letter coded DNA/RNA sequences have less information content than 20-letter coded protein sequences. While BERT (Bidirectional Encoder Representations from Transformers)-like language models have been developed for RNA, they are ineffective at capturing the evolutionary information from homologous sequences because unlike proteins, RNA sequences are less conserved. Here, we have developed an unsupervised multiple sequence alignment-based RNA language model (RNA-MSM) by utilizing homologous sequences from an automatic pipeline, RNAcmap, as it can provide significantly more homologous sequences than manually annotated Rfam. We demonstrate that the resulting unsupervised, two-dimensional attention maps and one-dimensional embeddings from RNA-MSM contain structural information. In fact, they can be directly mapped with high accuracy to 2D base pairing probabilities and 1D solvent accessibilities, respectively. Further fine-tuning led to significantly improved performance on these two downstream tasks compared with existing state-of-the-art techniques including SPOT-RNA2 and RNAsnap2. By comparison, RNA-FM, a BERT-based RNA language model, performs worse than one-hot encoding with its embedding in base pair and solvent-accessible surface area prediction. We anticipate that the pre-trained RNA-MSM model can be fine-tuned on many other tasks related to RNA structure and function.
Collapse
Affiliation(s)
- Yikun Zhang
- School of Electronic and Computer Engineering, Peking University, Shenzhen 518055, China
- AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, Shenzen 518055, China
| | - Mei Lang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | - Jiuhong Jiang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | - Zhiqiang Gao
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- Peng Cheng Laboratory, Shenzhen 518066, China
| | - Fan Xu
- Peng Cheng Laboratory, Shenzhen 518066, China
| | - Thomas Litfin
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD 4215, Australia
| | - Ke Chen
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | - Jaswinder Singh
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | | | - Guoli Song
- Peng Cheng Laboratory, Shenzhen 518066, China
| | | | - Jian Zhan
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | - Jie Chen
- School of Electronic and Computer Engineering, Peking University, Shenzhen 518055, China
- Peng Cheng Laboratory, Shenzhen 518066, China
| | - Yaoqi Zhou
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD 4215, Australia
| |
Collapse
|
3
|
Wang K, Zhou R, Wu Y, Li M. RLBind: a deep learning method to predict RNA-ligand binding sites. Brief Bioinform 2023; 24:6832814. [PMID: 36398911 DOI: 10.1093/bib/bbac486] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 09/28/2022] [Accepted: 10/14/2022] [Indexed: 11/19/2022] Open
Abstract
Identification of RNA-small molecule binding sites plays an essential role in RNA-targeted drug discovery and development. These small molecules are expected to be leading compounds to guide the development of new types of RNA-targeted therapeutics compared with regular therapeutics targeting proteins. RNAs can provide many potential drug targets with diverse structures and functions. However, up to now, only a few methods have been proposed. Predicting RNA-small molecule binding sites still remains a big challenge. New computational model is required to better extract the features and predict RNA-small molecule binding sites more accurately. In this paper, a deep learning model, RLBind, was proposed to predict RNA-small molecule binding sites from sequence-dependent and structure-dependent properties by combining global RNA sequence channel and local neighbor nucleotides channel. To our best knowledge, this research was the first to develop a convolutional neural network for RNA-small molecule binding sites prediction. Furthermore, RLBind also can be used as a potential tool when the RNA experimental tertiary structure is not available. The experimental results show that RLBind outperforms other state-of-the-art methods in predicting binding sites. Therefore, our study demonstrates that the combination of global information for full-length sequences and local information for limited local neighbor nucleotides in RNAs can improve the model's predictive performance for binding sites prediction. All datasets and resource codes are available at https://github.com/KailiWang1/RLBind.
Collapse
Affiliation(s)
- Kaili Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Renyi Zhou
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yifan Wu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
4
|
Abstract
RNA molecules carry out various cellular functions, and understanding the mechanisms behind their functions requires the knowledge of their 3D structures. Different types of computational methods have been developed to model RNA 3D structures over the past decade. These methods were widely used by researchers although their performance needs to be further improved. Recently, along with these traditional methods, machine-learning techniques have been increasingly applied to RNA 3D structure prediction and show significant improvement in performance. Here we shall give a brief review of the traditional methods and recent related advances in machine-learning approaches for RNA 3D structure prediction.
Collapse
Affiliation(s)
- Xiujuan Ou
- Institute of Biophysics, School of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Yi Zhang
- Institute of Biophysics, School of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Yiduo Xiong
- Institute of Biophysics, School of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Yi Xiao
- Institute of Biophysics, School of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| |
Collapse
|
5
|
Huang Y, Luo J, Jing R, Li M. Multi-model predictive analysis of RNA solvent accessibility based on modified residual attention mechanism. Brief Bioinform 2022; 23:6775603. [PMID: 36305428 DOI: 10.1093/bib/bbac470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/09/2022] [Accepted: 09/30/2022] [Indexed: 12/14/2022] Open
Abstract
Predicting RNA solvent accessibility using only primary sequence data can be regarded as sequence-based prediction work. Currently, the established studies for sequence-based RNA solvent accessibility prediction are limited due to the available number of datasets and black box prediction. To improve these issues, we first expanded the available RNA structures and then developed a sequence-based model using modified attention layers with different receptive fields to conform to the stem-loop structure of RNA chains. We measured the improvement with an extended dataset and further explored the model's interpretability by analysing the model structures, attention values and hyperparameters. Finally, we found that the developed model regarded the pieces of a sequence as templates during the training process. This work will be helpful for researchers who would like to build RNA attribute prediction models using deep learning in the future.
Collapse
Affiliation(s)
- Yuyao Huang
- College of Chemistry, Sichuan University, Chengdu, Sichuan, 610065, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, Sichuan, 646000, China
| | - Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu, Sichuan, 610065, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, Sichuan, 610065, China
| |
Collapse
|
6
|
Predicting RNA solvent accessibility from multi-scale context feature via multi-shot neural network. Anal Biochem 2022; 654:114802. [PMID: 35809650 DOI: 10.1016/j.ab.2022.114802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 06/11/2022] [Accepted: 06/28/2022] [Indexed: 11/24/2022]
Abstract
Knowledge of RNA solvent accessibility has recently become attractive due to the increasing awareness of its importance for key biological process. Accurately predicting the solvent accessibility of RNA is crucial for understanding its 3D structure and biological function. In this study, we develop a novel computational method, termed M2pred, for accurately predicting the solvent accessibility of RNA from sequence-based multi-scale context feature. In M2pred, three single-view features, i.e., base-pairing probabilities, position-specific frequency matrix, and a binary one-hot encoding, are first generated as three feature sources, and immediately concatenated to engender a super feature. Secondly, for the super feature, the matrix-format features of each nucleotide are extracted using an initialized sliding window technique, and regularly stacked into a cube-format feature. Then, using multi-scale context feature extraction strategy, a pyramid feature constructed of contextual feature of four scales related to target nucleotides is extracted from the cube-format feature. Finally, a customized multi-shot neural network framework, which is equipped with four different scales of receptive fields mainly integrating several residual attention blocks, is designed to dig discrimination information from the contextual pyramid feature. Experimental results demonstrate that the proposed M2pred achieve a high prediction performance and outperforms existing state-of-the-art prediction methods of RNA solvent accessibility.
Collapse
|
7
|
Solayman M, Litfin T, Singh J, Paliwal K, Zhou Y, Zhan J. Probing RNA structures and functions by solvent accessibility: an overview from experimental and computational perspectives. Brief Bioinform 2022; 23:bbac112. [PMID: 35348613 PMCID: PMC9116373 DOI: 10.1093/bib/bbac112] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 03/03/2022] [Accepted: 03/04/2022] [Indexed: 12/30/2022] Open
Abstract
Characterizing RNA structures and functions have mostly been focused on 2D, secondary and 3D, tertiary structures. Recent advances in experimental and computational techniques for probing or predicting RNA solvent accessibility make this 1D representation of tertiary structures an increasingly attractive feature to explore. Here, we provide a survey of these recent developments, which indicate the emergence of solvent accessibility as a simple 1D property, adding to secondary and tertiary structures for investigating complex structure-function relations of RNAs.
Collapse
Affiliation(s)
- Md Solayman
- Institute for Glycomics, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Thomas Litfin
- Institute for Glycomics, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
- Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
- Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Jian Zhan
- Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| |
Collapse
|
8
|
Johnson PZ, Kasprzak WK, Shapiro BA, Simon AE. Structural characterization of a new subclass of panicum mosaic virus-like 3' cap-independent translation enhancer. Nucleic Acids Res 2022; 50:1601-1619. [PMID: 35104872 PMCID: PMC8860577 DOI: 10.1093/nar/gkac007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 12/29/2021] [Accepted: 01/29/2022] [Indexed: 11/29/2022] Open
Abstract
Canonical eukaryotic mRNA translation requires 5'cap recognition by initiation factor 4E (eIF4E). In contrast, many positive-strand RNA virus genomes lack a 5'cap and promote translation by non-canonical mechanisms. Among plant viruses, PTEs are a major class of cap-independent translation enhancers located in/near the 3'UTR that recruit eIF4E to greatly enhance viral translation. Previous work proposed a single form of PTE characterized by a Y-shaped secondary structure with two terminal stem-loops (SL1 and SL2) atop a supporting stem containing a large, G-rich asymmetric loop that forms an essential pseudoknot (PK) involving C/U residues located between SL1 and SL2. We found that PTEs with less than three consecutive cytidylates available for PK formation have an upstream stem-loop that forms a kissing loop interaction with the apical loop of SL2, important for formation/stabilization of PK. PKs found in both subclasses of PTE assume a specific conformation with a hyperreactive guanylate (G*) in SHAPE structure probing, previously found critical for binding eIF4E. While PTE PKs were proposed to be formed by Watson-Crick base-pairing, alternative chemical probing and 3D modeling indicate that the Watson-Crick faces of G* and an adjacent guanylate have high solvent accessibilities. Thus, PTE PKs are likely composed primarily of non-canonical interactions.
Collapse
Affiliation(s)
- Philip Z Johnson
- Department of Cell Biology and Molecular Genetics, University of Maryland - College Park, College Park, MD 20742, USA
| | - Wojciech K Kasprzak
- Basic Science Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Bruce A Shapiro
- RNA Biology Laboratory, Center for Cancer Research, National Cancer Institute, Frederick, MD 21702, USA
| | - Anne E Simon
- Department of Cell Biology and Molecular Genetics, University of Maryland - College Park, College Park, MD 20742, USA
| |
Collapse
|
9
|
Solayman M, Litfin T, Zhou Y, Zhan J. High-throughput mapping of RNA solvent accessibility at the single-nucleotide resolution by RtcB ligation between a fixed 5'-OH-end linker and unique 3'-P-end fragments from hydroxyl radical cleavage. RNA Biol 2022; 19:1179-1189. [PMID: 36369947 PMCID: PMC9662193 DOI: 10.1080/15476286.2022.2145098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Given the challenges for the experimental determination of RNA tertiary structures, probing solvent accessibility has become increasingly important to gain functional insights. Among various chemical probes developed, backbone-cleaving hydroxyl radical is the only one that can provide unbiased detection of all accessible nucleotides. However, the readouts have been based on reverse transcription (RT) stop at the cleaving sites, which are prone to false positives due to PCR amplification bias, early drop-off of reverse transcriptase, and the use of random primers in RT reaction. Here, we introduced a fixed-primer method called RL-Seq by performing RtcB Ligation (RL) between a fixed 5'-OH-end linker and unique 3'-P-end fragments from hydroxyl radical cleavage prior to high-throughput sequencing. The application of this method to E. coli ribosomes confirmed its ability to accurately probe solvent accessibility with high sensitivity (low required sequencing depth) and accuracy (strong correlation to structure-derived values) at the single-nucleotide resolution. Moreover, a near-perfect correlation was found between the experiments with and without using unique molecular identifiers, indicating negligible PCR biases in RL-Seq. Further improvement of RL-Seq and its potential transcriptome-wide applications are discussed.
Collapse
Affiliation(s)
- Md Solayman
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD, Australia
| | - Thomas Litfin
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD, Australia
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD, Australia,Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, China,CONTACT Yaoqi Zhou Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, 518055, China
| | - Jian Zhan
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD, Australia,Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, China,Jian Zhan Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen518055, China
| |
Collapse
|
10
|
Wei H, Wang B, Yang J, Gao J. RNA Flexibility Prediction With Sequence Profile and Predicted Solvent Accessibility. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2017-2022. [PMID: 31794403 DOI: 10.1109/tcbb.2019.2956496] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Structural flexibility plays an essential role in many biological processes. B-factor is an important indicator to measure the flexibility of protein or RNA structures. Many methods were developed to predict protein B-factors, but few studies have been done for RNA B-factor prediction. In this paper, we proposed a new method RNAbval to predict RNA B-factors using random forest. The method was developed using a comprehensive set of features, including the sequence profile and predicted solvent accessibility. RNAbval achieved an improvement of 9.2-20.5 percent over the state-of-the-art method on two benchmark test datasets. The proposed method is available at http://yanglab.nankai.edu.cn/RNAbval/.
Collapse
|
11
|
Hanumanthappa AK, Singh J, Paliwal K, Singh J, Zhou Y. Single-sequence and profile-based prediction of RNA solvent accessibility using dilated convolutional neural network. Bioinformatics 2021; 36:5169-5176. [PMID: 33106872 DOI: 10.1093/bioinformatics/btaa652] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Revised: 06/30/2020] [Accepted: 07/14/2020] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION RNA solvent accessibility, similar to protein solvent accessibility, reflects the structural regions that are accessible to solvents or other functional biomolecules, and plays an important role for structural and functional characterization. Unlike protein solvent accessibility, only a few tools are available for predicting RNA solvent accessibility despite the fact that millions of RNA transcripts have unknown structures and functions. Also, these tools have limited accuracy. Here, we have developed RNAsnap2 that uses a dilated convolutional neural network with a new feature, based on predicted base-pairing probabilities from LinearPartition. RESULTS Using the same training set from the recent predictor RNAsol, RNAsnap2 provides an 11% improvement in median Pearson Correlation Coefficient (PCC) and 9% improvement in mean absolute errors for the same test set of 45 RNA chains. A larger improvement (22% in median PCC) is observed for 31 newly deposited RNA chains that are non-redundant and independent from the training and the test sets. A single-sequence version of RNAsnap2 (i.e. without using sequence profiles generated from homology search by Infernal) has achieved comparable performance to the profile-based RNAsol. In addition, RNAsnap2 has achieved comparable performance for protein-bound and protein-free RNAs. Both RNAsnap2 and RNAsnap2 (SingleSeq) are expected to be useful for searching structural signatures and locating functional regions of non-coding RNAs. AVAILABILITY AND IMPLEMENTATION Standalone-versions of RNAsnap2 and RNAsnap2 (SingleSeq) are available at https://github.com/jaswindersingh2/RNAsnap2. Direct prediction can also be made at https://sparks-lab.org/server/rnasnap2. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anil Kumar Hanumanthappa
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Jaspreet Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, QLD 4222, Australia
| |
Collapse
|
12
|
Singh J, Paliwal K, Singh J, Zhou Y. RNA Backbone Torsion and Pseudotorsion Angle Prediction Using Dilated Convolutional Neural Networks. J Chem Inf Model 2021; 61:2610-2622. [PMID: 34037398 DOI: 10.1021/acs.jcim.1c00153] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
RNA three-dimensional structure prediction has been relied on using a predicted or experimentally determined secondary structure as a restraint to reduce the conformational sampling space. However, the secondary-structure restraints are limited to paired bases, and the conformational space of the ribose-phosphate backbone is still too large to be sampled efficiently. Here, we employed the dilated convolutional neural network to predict backbone torsion and pseudotorsion angles using a single RNA sequence as input. The method called SPOT-RNA-1D was trained on a high-resolution training data set and tested on three independent, nonredundant, and high-resolution test sets. The proposed method yields substantially smaller mean absolute errors than the baseline predictors based on random predictions and based on helix conformations according to actual angle distributions. The mean absolute errors for three test sets range from 14°-44° for different angles, compared to 17°-62° by random prediction and 14°-58° by helix prediction. The method also accurately recovers the overall patterns of single or pairwise angle distributions. In general, torsion angles further away from the bases and associated with unpaired bases and paired bases involved in tertiary interactions are more difficult to predict. Compared to the best models in RNA-puzzles experiments, SPOT-RNA-1D yielded more accurate dihedral angles and, thus, are potentially useful as model quality indicators and restraints for RNA structure prediction as in protein structure prediction.
Collapse
Affiliation(s)
- Jaswinder Singh
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland 4122, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland 4122, Australia
| | - Jaspreet Singh
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland 4122, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Queensland 4222, Australia.,Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China.,Peking University Shenzhen Graduate School, Shenzhen 518055, P.R. China
| |
Collapse
|
13
|
Zhang T, Singh J, Litfin T, Zhan J, Paliwal K, Zhou Y. RNAcmap: A Fully Automatic Pipeline for Predicting Contact Maps of RNAs by Evolutionary Coupling Analysis. Bioinformatics 2021; 37:3494-3500. [PMID: 34021744 DOI: 10.1093/bioinformatics/btab391] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 03/27/2021] [Accepted: 05/18/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The accuracy of RNA secondary and tertiary structure prediction can be significantly improved by using structural restraints derived from evolutionary coupling or direct coupling analysis. Currently, these coupling analyses relied on manually curated multiple sequence alignments collected in the Rfam database, which contains 3016 families. By comparison, millions of non-coding RNA sequences are known. Here, we established RNAcmap, a fully automatic pipeline that enables evolutionary coupling analysis for any RNA sequences. The homology search was based on the covariance model built by INFERNAL according to two secondary structure predictors: a folding-based algorithm RNAfold and the latest deep-learning method SPOT-RNA. RESULTS We showed that the performance of RNAcmap is less dependent on the specific evolutionary coupling tool but is more dependent on the accuracy of secondary structure predictor with the best performance given by RNAcmap (SPOT-RNA). The performance of RNAcmap (SPOT-RNA) is comparable to that based on Rfam-supplied alignment and consistent for those sequences that are not in Rfam collections. Further improvement can be made with a simple meta predictor RNAcmap (SPOT-RNA/RNAfold) depending on which secondary structure predictor can find more homologous sequences. Reliable base-pairing information generated from RNAcmap, for RNAs with high effective homologous sequences, in particular, will be useful for aiding RNA structure prediction. AVAILABILITY RNAcmap is available as a web server at https://sparks-lab.org/server/rnacmap/ and as a standalone application along with the datasets at https://github.com/sparks-lab-org/RNAcmap_standalone. A platform independent and fully configured docker image of RNAcmap is also provided at https://hub.docker.com/r/jaswindersingh2/rnacmap.
Collapse
Affiliation(s)
- Tongchuan Zhang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jian Zhan
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia.,Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| |
Collapse
|
14
|
Gaither JBS, Lammi GE, Li JL, Gordon DM, Kuck HC, Kelly BJ, Fitch JR, White P. Synonymous variants that disrupt messenger RNA structure are significantly constrained in the human population. Gigascience 2021; 10:giab023. [PMID: 33822938 PMCID: PMC8023685 DOI: 10.1093/gigascience/giab023] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2020] [Revised: 02/10/2021] [Accepted: 03/10/2021] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The role of synonymous single-nucleotide variants in human health and disease is poorly understood, yet evidence suggests that this class of "silent" genetic variation plays multiple regulatory roles in both transcription and translation. One mechanism by which synonymous codons direct and modulate the translational process is through alteration of the elaborate structure formed by single-stranded mRNA molecules. While tools to computationally predict the effect of non-synonymous variants on protein structure are plentiful, analogous tools to systematically assess how synonymous variants might disrupt mRNA structure are lacking. RESULTS We developed novel software using a parallel processing framework for large-scale generation of secondary RNA structures and folding statistics for the transcriptome of any species. Focusing our analysis on the human transcriptome, we calculated 5 billion RNA-folding statistics for 469 million single-nucleotide variants in 45,800 transcripts. By considering the impact of all possible synonymous variants globally, we discover that synonymous variants predicted to disrupt mRNA structure have significantly lower rates of incidence in the human population. CONCLUSIONS These findings support the hypothesis that synonymous variants may play a role in genetic disorders due to their effects on mRNA structure. To evaluate the potential pathogenic impact of synonymous variants, we provide RNA stability, edge distance, and diversity metrics for every nucleotide in the human transcriptome and introduce a "Structural Predictivity Index" (SPI) to quantify structural constraint operating on any synonymous variant. Because no single RNA-folding metric can capture the diversity of mechanisms by which a variant could alter secondary mRNA structure, we generated a SUmmarized RNA Folding (SURF) metric to provide a single measurement to predict the impact of secondary structure altering variants in human genetic studies.
Collapse
Affiliation(s)
- Jeffrey B S Gaither
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - Grant E Lammi
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - James L Li
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - David M Gordon
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - Harkness C Kuck
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - Benjamin J Kelly
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - James R Fitch
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
| | - Peter White
- Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA
- Department of Pediatrics, College of Medicine, The Ohio State University, 370 W. 9th Avenue, Columbus, OH 43210, USA
| |
Collapse
|
15
|
Singh J, Paliwal K, Zhang T, Singh J, Litfin T, Zhou Y. Improved RNA Secondary Structure and Tertiary Base-pairing Prediction Using Evolutionary Profile, Mutational Coupling and Two-dimensional Transfer Learning. Bioinformatics 2021; 37:2589-2600. [PMID: 33704363 DOI: 10.1093/bioinformatics/btab165] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 02/05/2021] [Accepted: 03/08/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The recent discovery of numerous non-coding RNAs (long non-coding RNAs, in particular) has transformed our perception about the roles of RNAs in living organisms. Our ability to understand them, however, is hampered by our inability to solve their secondary and tertiary structures in high resolution efficiently by existing experimental techniques. Computational prediction of RNA secondary structure, on the other hand, has received much-needed improvement, recently, through deep learning of a large approximate data, followed by transfer learning with gold-standard base-pairing structures from high-resolution 3-D structures. Here, we expand this single-sequence-based learning to the use of evolutionary profiles and mutational coupling. RESULTS The new method allows large improvement not only in canonical base-pairs (RNA secondary structures) but more so in base-pairing associated with tertiary interactions such as pseudoknots, noncanonical and lone base-pairs. In particular, it is highly accurate for those RNAs of more than 1000 homologous sequences by achieving >0.8 F1-score (harmonic mean of sensitivity and precision) for 14/16 RNAs tested. The method can also significantly improve base-pairing prediction by incorporating artificial but functional homologous sequences generated from deep mutational scanning without any modification. The fully automatic method (publicly available as server and standalone software) should provide the scientific community a new powerful tool to capture not only the secondary structure but also tertiary base-pairing information for building three-dimensional models. It also highlights the future of accurately solving the base-pairing structure by using a large number of natural and/or artificial homologous sequences. AVAILABILITY Standalone-version of SPOT-RNA2 is available at https://github.com/jaswindersingh2/SPOT-RNA2. Direct prediction can also be made at https://sparks-lab.org/server/spot-rna2/. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above.
Collapse
Affiliation(s)
- Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Tongchuan Zhang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jaspreet Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| |
Collapse
|
16
|
Ke Y, Rao J, Zhao H, Lu Y, Xiao N, Yang Y. Accurate prediction of genome-wide RNA secondary structure profile based on extreme gradient boosting. Bioinformatics 2021; 36:4576-4582. [PMID: 32467966 DOI: 10.1093/bioinformatics/btaa534] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 05/01/2020] [Accepted: 05/23/2020] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION RNA secondary structure plays a vital role in fundamental cellular processes, and identification of RNA secondary structure is a key step to understand RNA functions. Recently, a few experimental methods were developed to profile genome-wide RNA secondary structure, i.e. the pairing probability of each nucleotide, through high-throughput sequencing techniques. However, these high-throughput methods have low precision and cannot cover all nucleotides due to limited sequencing coverage. RESULTS Here, we have developed a new method for the prediction of genome-wide RNA secondary structure profile from RNA sequence based on the extreme gradient boosting technique. The method achieves predictions with areas under the receiver operating characteristic curve (AUC) >0.9 on three different datasets, and AUC of 0.888 by another independent test on the recently released Zika virus data. These AUCs are consistently >5% greater than those by the CROSS method recently developed based on a shallow neural network. Further analysis on the 1000 Genome Project data showed that our predicted unpaired probabilities are highly correlated (>0.8) with the minor allele frequencies at synonymous, non-synonymous mutations, and mutations in untranslated regions, which were higher than those generated by RNAplfold. Moreover, the prediction over all human mRNA indicated a consistent result with previous observation that there is a periodic distribution of unpaired probability on codons. The accurate predictions by our method indicate that such model trained on genome-wide experimental data might be an alternative for analytical methods. AVAILABILITY AND IMPLEMENTATION The GRASP is available for academic use at https://github.com/sysu-yanglab/GRASP. SUPPLEMENTARY INFORMATION Supplementary data are available online.
Collapse
Affiliation(s)
- Yaobin Ke
- School of Data and Computer Science, Guangzhou 510000, China
| | - Jiahua Rao
- School of Data and Computer Science, Guangzhou 510000, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Guangzhou 510000, China
| | - Yutong Lu
- School of Data and Computer Science, Guangzhou 510000, China
| | - Nong Xiao
- School of Data and Computer Science, Guangzhou 510000, China
| | - Yuedong Yang
- School of Data and Computer Science, Guangzhou 510000, China.,Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University) of Ministry of Education, Guangzhou 510000, China
| |
Collapse
|
17
|
Singh J, Hanson J, Paliwal K, Zhou Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat Commun 2019; 10:5407. [PMID: 31776342 PMCID: PMC6881452 DOI: 10.1038/s41467-019-13395-9] [Citation(s) in RCA: 175] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 11/01/2019] [Indexed: 01/03/2023] Open
Abstract
The majority of our human genome transcribes into noncoding RNAs with unknown structures and functions. Obtaining functional clues for noncoding RNAs requires accurate base-pairing or secondary-structure prediction. However, the performance of such predictions by current folding-based algorithms has been stagnated for more than a decade. Here, we propose the use of deep contextual learning for base-pair prediction including those noncanonical and non-nested (pseudoknot) base pairs stabilized by tertiary interactions. Since only [Formula: see text]250 nonredundant, high-resolution RNA structures are available for model training, we utilize transfer learning from a model initially trained with a recent high-quality bpRNA dataset of [Formula: see text]10,000 nonredundant RNAs made available through comparative analysis. The resulting method achieves large, statistically significant improvement in predicting all base pairs, noncanonical and non-nested base pairs in particular. The proposed method (SPOT-RNA), with a freely available server and standalone software, should be useful for improving RNA structure modeling, sequence alignment, and functional annotations.
Collapse
Affiliation(s)
- Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia
| | - Jack Hanson
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr., Southport, QLD, 4222, Australia.
| |
Collapse
|
18
|
Zhou B, Yang Y, Zhan J, Dou X, Wang J, Zhou Y. Predicting functional long non-coding RNAs validated by low throughput experiments. RNA Biol 2019; 16:1555-1564. [PMID: 31345106 PMCID: PMC6779387 DOI: 10.1080/15476286.2019.1644590] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 06/17/2019] [Accepted: 07/10/2019] [Indexed: 01/05/2023] Open
Abstract
High-throughput techniques have uncovered hundreds and thousands of long non-coding RNAs (lncRNAs). Among them, only a tiny fraction has experimentally validated functions (EVlncRNAs) by low-throughput methods. What fraction of lncRNAs from high-throughput experiments (HTlncRNAs) is truly functional is an active subject of debate. Here, we developed the first method to distinguish EVlncRNAs from HTlncRNAs and mRNAs by using Support Vector Machines and found that EVlncRNAs can be well separated from HTlncRNAs and mRNAs with 0.6 for Matthews correlation coefficient, 64% for sensitivity, and 81% for precision for the independent human test set. The most useful features for classification are related to sequence conservations at RNA (for separating from HTlncRNAs) and protein (for separating from mRNA) levels. The method is found to be robust as the human-RNA-trained model is applicable to independent mouse RNAs with similar accuracy and to a lesser extent to plant RNAs. The method can recover newly discovered EVlncRNAs with high sensitivity. Its application to randomly selected 2000 human HTlncRNAs indicates that the majority of HTlncRNAs is probably non-functional but a large portion (nearly 30%) are likely functional. In other words, there is an ample number of lncRNAs whose specific biological roles are yet to be discovered. The method developed here is expected to speed up and reduce the cost of the discovery by prioritizing potentially functional lncRNAs prior to experimental validation. EVlncRNA-pred is available as a web server at http://biophy.dzu.edu.cn/lncrnapred/index.html . All datasets used in this study can be obtained from the same website.
Collapse
Affiliation(s)
- Bailing Zhou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
- College of Physics and Electronic Information, Dezhou University, Dezhou, China
| | - Yuedong Yang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Jian Zhan
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Xianghua Dou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
- College of Physics and Electronic Information, Dezhou University, Dezhou, China
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
- College of Physics and Electronic Information, Dezhou University, Dezhou, China
| | - Yaoqi Zhou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| |
Collapse
|
19
|
Sun S, Wu Q, Peng Z, Yang J. Enhanced prediction of RNA solvent accessibility with long short-term memory neural networks and improved sequence profiles. Bioinformatics 2019; 35:1686-1691. [PMID: 30321300 DOI: 10.1093/bioinformatics/bty876] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 09/11/2018] [Accepted: 10/13/2018] [Indexed: 02/21/2025] Open
Abstract
MOTIVATION The de novo prediction of RNA tertiary structure remains a grand challenge. Predicted RNA solvent accessibility provides an opportunity to address this challenge. To the best of our knowledge, there is only one method (RNAsnap) available for RNA solvent accessibility prediction. However, its performance is unsatisfactory for protein-free RNAs. RESULTS We developed RNAsol, a new algorithm to predict RNA solvent accessibility. RNAsol was built based on improved sequence profiles from the covariance models and trained with the long short-term memory (LSTM) neural networks. Independent tests on the same datasets from RNAsnap show that RNAsol achieves the mean Pearson's correlation coefficient (PCC) of 0.43/0.26 for the protein-bound/protein-free RNA molecules, which is 26.5%/136.4% higher than that of RNAsnap. When the training set is enlarged to include both types of RNAs, the PCCs increase to 0.49 and 0.46 for protein-bound and protein-free RNAs, respectively. The success of RNAsol is attributed to two aspects, including the improved sequence profiles constructed by the sequence-profile alignment and the enhanced training by the LSTM neural networks. AVAILABILITY AND IMPLEMENTATION http://yanglab.nankai.edu.cn/RNAsol/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Saisai Sun
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Qi Wu
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| |
Collapse
|
20
|
Wang YZ, Li J, Zhang S, Huang B, Yao G, Zhang J. An RNA Scoring Function for Tertiary Structure Prediction Based on Multi-Layer Neural Networks. Mol Biol 2019. [DOI: 10.1134/s0026893319010175] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
21
|
Guruge I, Taherzadeh G, Zhan J, Zhou Y, Yang Y. B
-factor profile prediction for RNA flexibility using support vector machines. J Comput Chem 2017; 39:407-411. [DOI: 10.1002/jcc.25124] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 11/07/2017] [Indexed: 12/12/2022]
Affiliation(s)
- Ivantha Guruge
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
| | - Jian Zhan
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
| | - Yuedong Yang
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
- School of Data and Computer Science; Sun Yat-sen University; Guangzhou 510275 China
| |
Collapse
|
22
|
Jegousse C, Yang Y, Zhan J, Wang J, Zhou Y. Structural signatures of thermal adaptation of bacterial ribosomal RNA, transfer RNA, and messenger RNA. PLoS One 2017; 12:e0184722. [PMID: 28910383 PMCID: PMC5598986 DOI: 10.1371/journal.pone.0184722] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 08/29/2017] [Indexed: 12/02/2022] Open
Abstract
Temperature adaptation of bacterial RNAs is a subject of both fundamental and practical interest because it will allow a better understanding of molecular mechanism of RNA folding with potential industrial application of functional thermophilic or psychrophilic RNAs. Here, we performed a comprehensive study of rRNA, tRNA, and mRNA of more than 200 bacterial species with optimal growth temperatures (OGT) ranging from 4°C to 95°C. We investigated temperature adaptation at primary, secondary and tertiary structure levels. We showed that unlike mRNA, tRNA and rRNA were optimized for their structures at compositional levels with significant tertiary structural features even for their corresponding randomly permutated sequences. tRNA and rRNA are more exposed to solvent but remain structured for hyperthermophiles with nearly OGT-independent fluctuation of solvent accessible surface area within a single RNA chain. mRNA in hyperthermophiles is essentially the same as random sequences without tertiary structures although many mRNA in mesophiles and psychrophiles have well-defined tertiary structures based on their low overall solvent exposure with clear separation of deeply buried from partly exposed bases as in tRNA and rRNA. These results provide new insight into temperature adaptation of different RNAs.
Collapse
MESH Headings
- Bacteria/genetics
- Databases, Genetic
- Models, Molecular
- Nucleic Acid Conformation
- RNA Folding/drug effects
- RNA, Bacterial/chemistry
- RNA, Bacterial/drug effects
- RNA, Messenger/chemistry
- RNA, Messenger/drug effects
- RNA, Ribosomal/chemistry
- RNA, Ribosomal/drug effects
- RNA, Transfer/chemistry
- RNA, Transfer/drug effects
- Solvents/pharmacology
- Temperature
Collapse
Affiliation(s)
- Clara Jegousse
- UFR Sciences et Techniques, Université de Nantes, 2 rue de la Houssinière, Nantes, France
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Yuedong Yang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Jian Zhan
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
- * E-mail:
| |
Collapse
|
23
|
Livingstone M, Folkman L, Yang Y, Zhang P, Mort M, Cooper DN, Liu Y, Stantic B, Zhou Y. Investigating DNA-, RNA-, and protein-based features as a means to discriminate pathogenic synonymous variants. Hum Mutat 2017. [DOI: 10.1002/humu.23283] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Mark Livingstone
- School of Information and Communication Technology; Griffith University; Southport Queensland 4222 Australia
| | - Lukas Folkman
- School of Information and Communication Technology; Griffith University; Southport Queensland 4222 Australia
| | - Yuedong Yang
- School of Information and Communication Technology; Griffith University; Southport Queensland 4222 Australia
- Institute for Glycomics; Griffith University; Southport Queensland 4222 Australia
| | - Ping Zhang
- Menzies Health Institute; Griffith University; Southport Queensland 4222 Australia
| | - Matthew Mort
- Institute of Medical Genetics; Cardiff University; Cardiff CF144XN United Kingdom
| | - David N. Cooper
- Institute of Medical Genetics; Cardiff University; Cardiff CF144XN United Kingdom
| | - Yunlong Liu
- Department of Medical and Molecular Genetics; Indiana University; Indianapolis Indiana 46202
| | - Bela Stantic
- School of Information and Communication Technology; Griffith University; Southport Queensland 4222 Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology; Griffith University; Southport Queensland 4222 Australia
- Institute for Glycomics; Griffith University; Southport Queensland 4222 Australia
| |
Collapse
|