1
|
Qiao Y, Yang R, Liu Y, Chen J, Zhao L, Huo P, Wang Z, Bu D, Wu Y, Zhao Y. DeepFusion: A deep bimodal information fusion network for unraveling protein-RNA interactions using in vivo RNA structures. Comput Struct Biotechnol J 2024; 23:617-625. [PMID: 38274994 PMCID: PMC10808905 DOI: 10.1016/j.csbj.2023.12.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/04/2023] [Accepted: 12/26/2023] [Indexed: 01/27/2024] Open
Abstract
RNA-binding proteins (RBPs) are key post-transcriptional regulators, and the malfunctions of RBP-RNA binding lead to diverse human diseases. However, prediction of RBP binding sites is largely based on RNA sequence features, whereas in vivo RNA structural features based on high-throughput sequencing are rarely incorporated. Here, we designed a deep bimodal information fusion network called DeepFusion for unraveling protein-RNA interactions by incorporating structural features derived from DMS-seq data. DeepFusion integrates two sub-models to extract local motif-like information and long-term context information. We show that DeepFusion performs best compared with other cutting-edge methods with only sequence inputs on two datasets. DeepFusion's performance is further improved with bimodal input after adding in vivo DMS-seq structural features. Furthermore, DeepFusion can be used for analyzing RNA degradation, demonstrating significantly different RBP-binding scores in genes with slow degradation rates versus those with rapid degradation rates. DeepFusion thus provides enhanced abilities for further analysis of functional RNAs. DeepFusion's code and data are available at http://bioinfo.org/deepfusion/.
Collapse
Affiliation(s)
- Yixuan Qiao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Rui Yang
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yang Liu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiaxin Chen
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Lianhe Zhao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Peipei Huo
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Zhihao Wang
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Dechao Bu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Yang Wu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Yi Zhao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
2
|
Sasse A, Ray D, Laverty KU, Tam CL, Albu M, Zheng H, Lyudovyk O, Dalal T, Nie K, Magis C, Notredame C, Weirauch MT, Hughes TR, Morris Q. Reconstructing the sequence specificities of RNA-binding proteins across eukaryotes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.15.618476. [PMID: 39464061 PMCID: PMC11507768 DOI: 10.1101/2024.10.15.618476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
RNA-binding proteins (RBPs) are key regulators of gene expression. Here, we introduce EuPRI (Eukaryotic Protein-RNA Interactions) - a freely available resource of RNA motifs for 34,736 RBPs from 690 eukaryotes. EuPRI includes in vitro binding data for 504 RBPs, including newly collected RNAcompete data for 174 RBPs, along with thousands of reconstructed motifs. We reconstruct these motifs with a new computational platform - Joint Protein-Ligand Embedding (JPLE) - which can detect distant homology relationships and map specificity-determining peptides. EuPRI quadruples the number of known RBP motifs, expanding the motif repertoire across all major eukaryotic clades, and assigning motifs to the majority of human RBPs. EuPRI drastically improves knowledge of RBP motifs in flowering plants. For example, it increases the number of Arabidopsis thaliana RBP motifs 7-fold, from 14 to 105. EuPRI also has broad utility for inferring post-transcriptional function and evolutionary relationships. We demonstrate this by predicting a role for 12 Arabidopsis thaliana RBPs in RNA stability and identifying rapid and recent evolution of post-transcriptional regulatory networks in worms and plants. In contrast, the vertebrate RNA motif set has remained relatively stable after its drastic expansion between the metazoan and vertebrate ancestors. EuPRI represents a powerful resource for the study of gene regulation across eukaryotes.
Collapse
Affiliation(s)
- Alexander Sasse
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
- Department of Computer Science, University of Washington, Seattle, WA, USA
- Vector Institute, Toronto, ON Canada
| | - Debashish Ray
- Donnelly Centre, University of Toronto, Toronto, ON Canada
| | - Kaitlin U Laverty
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Cyrus L Tam
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
| | - Mihai Albu
- Donnelly Centre, University of Toronto, Toronto, ON Canada
| | - Hong Zheng
- Donnelly Centre, University of Toronto, Toronto, ON Canada
| | - Olga Lyudovyk
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
| | - Taykhoom Dalal
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
| | - Kate Nie
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
| | - Cedrik Magis
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Cedric Notredame
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Matthew T Weirauch
- Center for Autoimmune Genomics and Etiology, Divisions of Allergy & Immunology, Human Genetics, Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
| | - Quaid Morris
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Donnelly Centre, University of Toronto, Toronto, ON Canada
- Vector Institute, Toronto, ON Canada
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| |
Collapse
|
3
|
Yuan L, Zhao L, Lai J, Jiang Y, Zhang Q, Shen Z, Zheng CH, Huang DS. iCRBP-LKHA: Large convolutional kernel and hybrid channel-spatial attention for identifying circRNA-RBP interaction sites. PLoS Comput Biol 2024; 20:e1012399. [PMID: 39173070 PMCID: PMC11373821 DOI: 10.1371/journal.pcbi.1012399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 09/04/2024] [Accepted: 08/08/2024] [Indexed: 08/24/2024] Open
Abstract
Circular RNAs (circRNAs) play vital roles in transcription and translation. Identification of circRNA-RBP (RNA-binding protein) interaction sites has become a fundamental step in molecular and cell biology. Deep learning (DL)-based methods have been proposed to predict circRNA-RBP interaction sites and achieved impressive identification performance. However, those methods cannot effectively capture long-distance dependencies, and cannot effectively utilize the interaction information of multiple features. To overcome those limitations, we propose a DL-based model iCRBP-LKHA using deep hybrid networks for identifying circRNA-RBP interaction sites. iCRBP-LKHA adopts five encoding schemes. Meanwhile, the neural network architecture, which consists of large kernel convolutional neural network (LKCNN), convolutional block attention module with one-dimensional convolution (CBAM-1D) and bidirectional gating recurrent unit (BiGRU), can explore local information, global context information and multiple features interaction information automatically. To verify the effectiveness of iCRBP-LKHA, we compared its performance with shallow learning algorithms on 37 circRNAs datasets and 37 circRNAs stringent datasets. And we compared its performance with state-of-the-art DL-based methods on 37 circRNAs datasets, 37 circRNAs stringent datasets and 31 linear RNAs datasets. The experimental results not only show that iCRBP-LKHA outperforms other competing methods, but also demonstrate the potential of this model in identifying other RNA-RBP interaction sites.
Collapse
Affiliation(s)
- Lin Yuan
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Ling Zhao
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Jinling Lai
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Yufeng Jiang
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Qinhu Zhang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Ningbo, China
| | - Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, Nanyang, China
| | - Chun-Hou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
| | - De-Shuang Huang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Ningbo, China
| |
Collapse
|
4
|
Lasantha D, Vidanagamachchi S, Nallaperuma S. CRIECNN: Ensemble convolutional neural network and advanced feature extraction methods for the precise forecasting of circRNA-RBP binding sites. Comput Biol Med 2024; 174:108466. [PMID: 38615462 DOI: 10.1016/j.compbiomed.2024.108466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 03/29/2024] [Accepted: 04/08/2024] [Indexed: 04/16/2024]
Abstract
Circular RNAs (circRNAs) have surfaced as important non-coding RNA molecules in biology. Understanding interactions between circRNAs and RNA-binding proteins (RBPs) is crucial in circRNA research. Existing prediction models suffer from limited availability and accuracy, necessitating advanced approaches. In this study, we propose CRIECNN (Circular RNA-RBP Interaction predictor using an Ensemble Convolutional Neural Network), a novel ensemble deep learning model that enhances circRNA-RBP binding site prediction accuracy. CRIECNN employs advanced feature extraction methods and evaluates four distinct sequence datasets and encoding techniques (BERT, Doc2Vec, KNF, EIIP). The model consists of an ensemble convolutional neural network, a BiLSTM, and a self-attention mechanism for feature refinement. Our results demonstrate that CRIECNN outperforms state-of-the-art methods in accuracy and performance, effectively predicting circRNA-RBP interactions from both full-length sequences and fragments. This novel strategy makes an enormous advancement in the prediction of circRNA-RBP interactions, improving our understanding of circRNAs and their regulatory roles.
Collapse
Affiliation(s)
- Dilan Lasantha
- Department of Computer Science, University of Ruhuna, Sri Lanka.
| | | | - Sam Nallaperuma
- Department of Engineering, University of Cambridge, United Kingdom.
| |
Collapse
|
5
|
Cao C, Wang C, Yang S, Zou Q. CircSI-SSL: circRNA-binding site identification based on self-supervised learning. Bioinformatics 2024; 40:btae004. [PMID: 38180876 PMCID: PMC10789309 DOI: 10.1093/bioinformatics/btae004] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 11/13/2023] [Accepted: 01/03/2024] [Indexed: 01/07/2024] Open
Abstract
MOTIVATION In recent years, circular RNAs (circRNAs), the particular form of RNA with a closed-loop structure, have attracted widespread attention due to their physiological significance (they can directly bind proteins), leading to the development of numerous protein site identification algorithms. Unfortunately, these studies are supervised and require the vast majority of labeled samples in training to produce superior performance. But the acquisition of sample labels requires a large number of biological experiments and is difficult to obtain. RESULTS To resolve this matter that a great deal of tags need to be trained in the circRNA-binding site prediction task, a self-supervised learning binding site identification algorithm named CircSI-SSL is proposed in this article. According to the survey, this is unprecedented in the research field. Specifically, CircSI-SSL initially combines multiple feature coding schemes and employs RNA_Transformer for cross-view sequence prediction (self-supervised task) to learn mutual information from the multi-view data, and then fine-tuning with only a few sample labels. Comprehensive experiments on six widely used circRNA datasets indicate that our CircSI-SSL algorithm achieves excellent performance in comparison to previous algorithms, even in the extreme case where the ratio of training data to test data is 1:9. In addition, the transplantation experiment of six linRNA datasets without network modification and hyperparameter adjustment shows that CircSI-SSL has good scalability. In summary, the prediction algorithm based on self-supervised learning proposed in this article is expected to replace previous supervised algorithms and has more extensive application value. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/cc646201081/CircSI-SSL.
Collapse
Affiliation(s)
- Chao Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shuhong Yang
- Faculty of Mathematics and Computer Science, Guangdong Ocean University, Zhanjiang, Guangdong 524088, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| |
Collapse
|
6
|
Akbari Rokn Abadi S, Tabatabaei S, Koohi S. KDeep: a new memory-efficient data extraction method for accurately predicting DNA/RNA transcription factor binding sites. J Transl Med 2023; 21:727. [PMID: 37845681 PMCID: PMC10580661 DOI: 10.1186/s12967-023-04593-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 10/04/2023] [Indexed: 10/18/2023] Open
Abstract
This paper addresses the crucial task of identifying DNA/RNA binding sites, which has implications in drug/vaccine design, protein engineering, and cancer research. Existing methods utilize complex neural network structures, diverse input types, and machine learning techniques for feature extraction. However, the growing volume of sequences poses processing challenges. This study introduces KDeep, employing a CNN-LSTM architecture with a novel encoding method called 2Lk. 2Lk enhances prediction accuracy, reduces memory consumption by up to 84%, reduces trainable parameters, and improves interpretability by approximately 79% compared to state-of-the-art approaches. KDeep offers a promising solution for accurate and efficient binding site prediction.
Collapse
Affiliation(s)
| | | | - Somayyeh Koohi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
| |
Collapse
|
7
|
Pan Z, Zhou S, Liu T, Liu C, Zang M, Wang Q. WVDL: Weighted Voting Deep Learning Model for Predicting RNA-Protein Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3322-3328. [PMID: 37028092 DOI: 10.1109/tcbb.2023.3252276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
RNA-binding proteins are important for the process of cell life activities. High-throughput technique experimental method to discover RNA-protein binding sites is time-consuming and expensive. Deep learning is an effective theory for predicting RNA-protein binding sites. Using weighted voting method to integrate multiple basic classifier models can improve model performance. Thus, in our study, we propose a weighted voting deep learning model (WVDL), which uses weighted voting method to combine convolutional neural network (CNN), long short term memory network (LSTM) and residual network (ResNet). First, the final forecast result of WVDL outperforms the basic classifier models and other ensemble strategies. Second, WVDL can extract more effective features by using weighted voting to find the best weighted combination. And, the CNN model also can draw the predicted motif pictures. Third, WVDL gets a competitive experiment result on public RBP-24 datasets comparing with other state-of-the-art methods. The source code of our proposed WVDL can be found in https://github.com/biomg/WVDL.
Collapse
|
8
|
Cao C, Yang S, Li M, Li C. CircSSNN: circRNA-binding site prediction via sequence self-attention neural networks with pre-normalization. BMC Bioinformatics 2023; 24:220. [PMID: 37254080 DOI: 10.1186/s12859-023-05352-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 05/25/2023] [Indexed: 06/01/2023] Open
Abstract
BACKGROUND Circular RNAs (circRNAs) play a significant role in some diseases by acting as transcription templates. Therefore, analyzing the interaction mechanism between circRNA and RNA-binding proteins (RBPs) has far-reaching implications for the prevention and treatment of diseases. Existing models for circRNA-RBP identification usually adopt convolution neural network (CNN), recurrent neural network (RNN), or their variants as feature extractors. Most of them have drawbacks such as poor parallelism, insufficient stability, and inability to capture long-term dependencies. METHODS In this paper, we propose a new method completely using the self-attention mechanism to capture deep semantic features of RNA sequences. On this basis, we construct a CircSSNN model for the cirRNA-RBP identification. The proposed model constructs a feature scheme by fusing circRNA sequence representations with statistical distributions, static local contexts, and dynamic global contexts. With a stable and efficient network architecture, the distance between any two positions in a sequence is reduced to a constant, so CircSSNN can quickly capture the long-term dependencies and extract the deep semantic features. RESULTS Experiments on 37 circRNA datasets show that the proposed model has overall advantages in stability, parallelism, and prediction performance. Keeping the network structure and hyperparameters unchanged, we directly apply the CircSSNN to linRNA datasets. The favorable results show that CircSSNN can be transformed simply and efficiently without task-oriented tuning. CONCLUSIONS In conclusion, CircSSNN can serve as an appealing circRNA-RBP identification tool with good identification performance, excellent scalability, and wide application scope without the need for task-oriented fine-tuning of parameters, which is expected to reduce the professional threshold required for hyperparameter tuning in bioinformatics analysis.
Collapse
Affiliation(s)
- Chao Cao
- School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou, China
| | - Shuhong Yang
- Key Laboratory of Guangxi Universities on Intelligent Computing and Distributed Information Processing, Guangxi University of Science and Technology, Liuzhou, China.
| | - Mengli Li
- School of Technology, Guilin University, Guilin, China
| | - Chungui Li
- School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou, China.
| |
Collapse
|
9
|
Xu Y, Zhu J, Huang W, Xu K, Yang R, Zhang QC, Sun L. PrismNet: predicting protein-RNA interaction using in vivo RNA structural information. Nucleic Acids Res 2023:7151359. [PMID: 37140045 DOI: 10.1093/nar/gkad353] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/13/2023] [Accepted: 04/26/2023] [Indexed: 05/05/2023] Open
Abstract
Fundamental to post-transcriptional regulation, the in vivo binding of RNA binding proteins (RBPs) on their RNA targets heavily depends on RNA structures. To date, most methods for RBP-RNA interaction prediction are based on RNA structures predicted from sequences, which do not consider the various intracellular environments and thus cannot predict cell type-specific RBP-RNA interactions. Here, we present a web server PrismNet that uses a deep learning tool to integrate in vivo RNA secondary structures measured by icSHAPE experiments with RBP binding site information from UV cross-linking and immunoprecipitation in the same cell lines to predict cell type-specific RBP-RNA interactions. Taking an RBP and an RNA region with sequential and structural information as input ('Sequence & Structure' mode), PrismNet outputs the binding probability of the RBP and this RNA region, together with a saliency map and a sequence-structure integrative motif. The web server is freely available at http://prismnetweb.zhanglab.net.
Collapse
Affiliation(s)
- Yiran Xu
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Jianghui Zhu
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Wenze Huang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Kui Xu
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Rui Yang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Qiangfeng Cliff Zhang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Lei Sun
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Shandong University, Qingdao 266237, China
| |
Collapse
|
10
|
Wang X, Zhang M, Long C, Yao L, Zhu M. Self-Attention Based Neural Network for Predicting RNA-Protein Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1469-1479. [PMID: 36067103 DOI: 10.1109/tcbb.2022.3204661] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Proteins binding to Ribonucleic Acid (RNA) inside cells are called RNA-binding proteins (RBP), which play a crucial role in gene regulation. The identification of RNA-protein binding sites helps to understand the function of RBP better. Although many computational methods have been developed to predict RNA-protein binding sites, their prediction accuracy on small sample datasets needs improvement. To overcome this limitation, we propose a novel model called SA-Net, which utilizes k-mer embedding to encode RNA sequences and a self-attention-based neural network to extract sequence features. K-mer embedding assists the model to discover significant subsequence fragments associated with binding sites. The self-attention mechanism captures contextual information from the entire input sequence globally, performing well in small sample sequence learning. Experimental results demonstrate that SA-Net attains state-of-the-art results on the RBP-24 dataset. We find that 4-mer embedding aids the model to achieve optimal performance. We also show that the self-attention network outperforms the commonly used CNN and CNN-BLSTM models in sequence feature extraction.
Collapse
|
11
|
Koo PK, Ploenzke M, Anand P, Paul S, Majdandzic A. ResidualBind: Uncovering Sequence-Structure Preferences of RNA-Binding Proteins with Deep Neural Networks. Methods Mol Biol 2023; 2586:197-215. [PMID: 36705906 DOI: 10.1007/978-1-0716-2768-6_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Deep neural networks have demonstrated improved performance at predicting sequence specificities of DNA- and RNA-binding proteins. However, it remains unclear why they perform better than previous methods that rely on k-mers and position weight matrices. Here, we highlight a recent deep learning-based software package, called ResidualBind, that analyzes RNA-protein interactions using only RNA sequence as an input feature and performs global importance analysis for model interpretability. We discuss practical considerations for model interpretability to uncover learned sequence motifs and their secondary structure preferences.
Collapse
Affiliation(s)
- Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| | - Matt Ploenzke
- Department of Biostatistics, Harvard University, Cambridge, MA, USA
| | | | - Steffan Paul
- Bioinformatics Program, Harvard Medical School, Boston, MA, USA
| | - Antonio Majdandzic
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
12
|
Huang W, Zhang QC. Prediction of Dynamic RBP-RNA Interactions Using PrismNet. Methods Mol Biol 2023; 2568:123-132. [PMID: 36227565 DOI: 10.1007/978-1-0716-2687-0_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
A capacity to detect the binding profiles of RNA targets for an RNA-binding protein (RBP) under different cellular conditions is essential to understand the functions of the RBP in posttranscriptional regulation. However, the prediction of RBP binding sites in vivo remains challenging. Tools that predict RBP-RNA interactions using sequence and/or predicted structures cannot reflect the exact state of RNA in vivo. PrismNet, which uses both sequences and in vivo RNA structure information from probing experiments, can accurately predict RBP binding under different cellular conditions by deep learning, and can be applied for functional studies of RBPs. Here, we provide a detailed protocol showing how to train a PrismNet model of RBP-RNA interactions for an RBP, and how to apply the model for predictions of the RBP binding under different conditions.
Collapse
Affiliation(s)
- Wenze Huang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Qiangfeng Cliff Zhang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China.
- Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China.
- Tsinghua-Peking Center for Life Sciences, Beijing, China.
| |
Collapse
|
13
|
Bheemireddy S, Sandhya S, Srinivasan N, Sowdhamini R. Computational tools to study RNA-protein complexes. Front Mol Biosci 2022; 9:954926. [PMID: 36275618 PMCID: PMC9585174 DOI: 10.3389/fmolb.2022.954926] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 09/20/2022] [Indexed: 11/19/2022] Open
Abstract
RNA is the key player in many cellular processes such as signal transduction, replication, transport, cell division, transcription, and translation. These diverse functions are accomplished through interactions of RNA with proteins. However, protein–RNA interactions are still poorly derstood in contrast to protein–protein and protein–DNA interactions. This knowledge gap can be attributed to the limited availability of protein-RNA structures along with the experimental difficulties in studying these complexes. Recent progress in computational resources has expanded the number of tools available for studying protein-RNA interactions at various molecular levels. These include tools for predicting interacting residues from primary sequences, modelling of protein-RNA complexes, predicting hotspots in these complexes and insights into derstanding in the dynamics of their interactions. Each of these tools has its strengths and limitations, which makes it significant to select an optimal approach for the question of interest. Here we present a mini review of computational tools to study different aspects of protein-RNA interactions, with focus on overall application, development of the field and the future perspectives.
Collapse
Affiliation(s)
- Sneha Bheemireddy
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Sankaran Sandhya
- Department of Biotechnology, Faculty of Life and Allied Health Sciences, M.S. Ramaiah University of Applied Sciences, Bengaluru, India
- *Correspondence: Sankaran Sandhya, ; Ramanathan Sowdhamini,
| | | | - Ramanathan Sowdhamini
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- National Centre for Biological Sciences, TIFR, GKVK Campus, Bangalore, India
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
- *Correspondence: Sankaran Sandhya, ; Ramanathan Sowdhamini,
| |
Collapse
|
14
|
Pepe G, Appierdo R, Carrino C, Ballesio F, Helmer-Citterich M, Gherardini PF. Artificial intelligence methods enhance the discovery of RNA interactions. Front Mol Biosci 2022; 9:1000205. [PMID: 36275611 PMCID: PMC9585310 DOI: 10.3389/fmolb.2022.1000205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Understanding how RNAs interact with proteins, RNAs, or other molecules remains a challenge of main interest in biology, given the importance of these complexes in both normal and pathological cellular processes. Since experimental datasets are starting to be available for hundreds of functional interactions between RNAs and other biomolecules, several machine learning and deep learning algorithms have been proposed for predicting RNA-RNA or RNA-protein interactions. However, most of these approaches were evaluated on a single dataset, making performance comparisons difficult. With this review, we aim to summarize recent computational methods, developed in this broad research area, highlighting feature encoding and machine learning strategies adopted. Given the magnitude of the effect that dataset size and quality have on performance, we explored the characteristics of these datasets. Additionally, we discuss multiple approaches to generate datasets of negative examples for training. Finally, we describe the best-performing methods to predict interactions between proteins and specific classes of RNA molecules, such as circular RNAs (circRNAs) and long non-coding RNAs (lncRNAs), and methods to predict RNA-RNA or RNA-RBP interactions independently of the RNA type.
Collapse
Affiliation(s)
- G Pepe
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - R Appierdo
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - C Carrino
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - F Ballesio
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - M Helmer-Citterich
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - PF Gherardini
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| |
Collapse
|
15
|
Laverty KU, Jolma A, Pour SE, Zheng H, Ray D, Morris Q, Hughes TR. PRIESSTESS: interpretable, high-performing models of the sequence and structure preferences of RNA-binding proteins. Nucleic Acids Res 2022; 50:e111. [PMID: 36018788 DOI: 10.1093/nar/gkac694] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 07/22/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022] Open
Abstract
Modelling both primary sequence and secondary structure preferences for RNA binding proteins (RBPs) remains an ongoing challenge. Current models use varied RNA structure representations and can be difficult to interpret and evaluate. To address these issues, we present a universal RNA motif-finding/scanning strategy, termed PRIESSTESS (Predictive RBP-RNA InterpretablE Sequence-Structure moTif regrESSion), that can be applied to diverse RNA binding datasets. PRIESSTESS identifies dozens of enriched RNA sequence and/or structure motifs that are subsequently reduced to a set of core motifs by logistic regression with LASSO regularization. Importantly, these core motifs are easily visualized and interpreted, and provide a measure of RBP secondary structure specificity. We used PRIESSTESS to interrogate new HTR-SELEX data for 23 RBPs with diverse RNA binding modes and captured known primary sequence and secondary structure preferences for each. Moreover, when applying PRIESSTESS to 144 RBPs across 202 RNA binding datasets, 75% showed an RNA secondary structure preference but only 10% had a preference besides unpaired bases, suggesting that most RBPs simply recognize the accessibility of primary sequences.
Collapse
Affiliation(s)
- Kaitlin U Laverty
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Arttu Jolma
- Department of Molecular Genetics, University of Toronto, Toronto, Canada.,Donnelly Centre, University of Toronto, Toronto, Canada
| | - Sara E Pour
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Hong Zheng
- Donnelly Centre, University of Toronto, Toronto, Canada
| | - Debashish Ray
- Donnelly Centre, University of Toronto, Toronto, Canada
| | - Quaid Morris
- Department of Molecular Genetics, University of Toronto, Toronto, Canada.,Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, Canada.,Donnelly Centre, University of Toronto, Toronto, Canada
| |
Collapse
|
16
|
Ma H, Wen H, Xue Z, Li G, Zhang Z. RNANetMotif: Identifying sequence-structure RNA network motifs in RNA-protein binding sites. PLoS Comput Biol 2022; 18:e1010293. [PMID: 35819951 PMCID: PMC9275694 DOI: 10.1371/journal.pcbi.1010293] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 06/09/2022] [Indexed: 11/19/2022] Open
Abstract
RNA molecules can adopt stable secondary and tertiary structures, which are essential in mediating physical interactions with other partners such as RNA binding proteins (RBPs) and in carrying out their cellular functions. In vivo and in vitro experiments such as RNAcompete and eCLIP have revealed in vitro binding preferences of RBPs to RNA oligomers and in vivo binding sites in cells. Analysis of these binding data showed that the structure properties of the RNAs in these binding sites are important determinants of the binding events; however, it has been a challenge to incorporate the structure information into an interpretable model. Here we describe a new approach, RNANetMotif, which takes predicted secondary structure of thousands of RNA sequences bound by an RBP as input and uses a graph theory approach to recognize enriched subgraphs. These enriched subgraphs are in essence shared sequence-structure elements that are important in RBP-RNA binding. To validate our approach, we performed RNA structure modeling via coarse-grained molecular dynamics folding simulations for selected 4 RBPs, and RNA-protein docking for LIN28B. The simulation results, e.g., solvent accessibility and energetics, further support the biological relevance of the discovered network subgraphs. RNA binding proteins (RBPs) regulate every aspect of RNA biology, including splicing, translation, transportation, and degradation. High-throughput technologies such as eCLIP have identified thousands of binding sites for a given RBP throughout the genome. It has been shown by earlier studies that, in addition to nucleotide sequences, the structure and conformation of RNAs also play important role in RBP-RNA interactions. Analogous to protein-protein interactions or protein-DNA interactions, it is likely that there exist intrinsic sequence-structure motifs common to these RNAs that underlie their binding specificity to specific RBPs. It is known that RNAs form energetically favorable secondary structures, which can be represented as graphs, with nucleotides being nodes and backbone covalent bonds and base-pairing hydrogen bonds representing edges. We hypothesize that these graphs can be mined by graph theory approaches to identify sequence-structure motifs as enriched sub-graphs. In this article, we described the details of this approach, termed RNANetMotif and associated new concepts, namely EKS (Extended K-mer Subgraph) and GraphK graph algorithm. To test the utility of our approach, we conducted 3D structure modeling of selected RNA sequences through molecular dynamics (MD) folding simulation and evaluated the significance of the discovered RNA motifs by comparing their spatial exposure with other regions on the RNA. We believe that this approach has the novelty of treating the RNA sequence as a graph and RBP binding sites as enriched subgraph, which has broader applications beyond RBP-RNA interactions.
Collapse
Affiliation(s)
- Hongli Ma
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- School of Mathematics, Shandong University, Jinan, China
| | - Han Wen
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Zhiyuan Xue
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
- School of Mathematics, Shandong University, Jinan, China
- School of Mathematical Science, Liaocheng University, Liaocheng, China
| | - Zhaolei Zhang
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
17
|
Du X, Zhao X, Zhang Y. DeepBtoD: Improved RNA-binding proteins prediction via integrated deep learning. J Bioinform Comput Biol 2022; 20:2250006. [PMID: 35451938 DOI: 10.1142/s0219720022500068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
RNA-binding proteins (RBPs) have crucial roles in various cellular processes such as alternative splicing and gene regulation. Therefore, the analysis and identification of RBPs is an essential issue. However, although many computational methods have been developed for predicting RBPs, a few studies simultaneously consider local and global information from the perspective of the RNA sequence. Facing this challenge, we present a novel method called DeepBtoD, which predicts RBPs directly from RNA sequences. First, a [Formula: see text]-BtoD encoding is designed, which takes into account the composition of [Formula: see text]-nucleotides and their relative positions and forms a local module. Second, we designed a multi-scale convolutional module embedded with a self-attentive mechanism, the ms-focusCNN, which is used to further learn more effective, diverse, and discriminative high-level features. Finally, global information is considered to supplement local modules with ensemble learning to predict whether the target RNA binds to RBPs. Our preliminary 24 independent test datasets show that our proposed method can classify RBPs with the area under the curve of 0.933. Remarkably, DeepBtoD shows competitive results across seven state-of-the-art methods, suggesting that RBPs can be highly recognized by integrating local [Formula: see text]-BtoD and global information only from RNA sequences. Hence, our integrative method may be useful to improve the power of RBPs prediction, which might be particularly useful for modeling protein-nucleic acid interactions in systems biology studies. Our DeepBtoD server can be accessed at http://175.27.228.227/DeepBtoD/.
Collapse
Affiliation(s)
- XiuQuan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, Anhui, P. R. China.,School of Computer Science and Technology, Anhui University, Hefei 230601, Anhui, P. R. China
| | - XiuJuan Zhao
- School of Computer Science and Technology, Anhui University, Hefei 230601, Anhui, P. R. China
| | - YanPing Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, Anhui, P. R. China
| |
Collapse
|
18
|
Wei J, Chen S, Zong L, Gao X, Li Y. Protein-RNA interaction prediction with deep learning: structure matters. Brief Bioinform 2022; 23:bbab540. [PMID: 34929730 PMCID: PMC8790951 DOI: 10.1093/bib/bbab540] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 11/14/2021] [Accepted: 11/22/2021] [Indexed: 12/11/2022] Open
Abstract
Protein-RNA interactions are of vital importance to a variety of cellular activities. Both experimental and computational techniques have been developed to study the interactions. Because of the limitation of the previous database, especially the lack of protein structure data, most of the existing computational methods rely heavily on the sequence data, with only a small portion of the methods utilizing the structural information. Recently, AlphaFold has revolutionized the entire protein and biology field. Foreseeably, the protein-RNA interaction prediction will also be promoted significantly in the upcoming years. In this work, we give a thorough review of this field, surveying both the binding site and binding preference prediction problems and covering the commonly used datasets, features and models. We also point out the potential challenges and opportunities in this field. This survey summarizes the development of the RNA-binding protein-RNA interaction field in the past and foresees its future development in the post-AlphaFold era.
Collapse
Affiliation(s)
- Junkang Wei
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
| | - Siyuan Chen
- Computational Bioscience Research Center (CBRC),
King Abdullah University of Science and Technology (KAUST),
23955-6900, Thuwal, Saudi Arabia
| | - Licheng Zong
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC),
King Abdullah University of Science and Technology (KAUST),
23955-6900, Thuwal, Saudi Arabia
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
- The CUHK Shenzhen Research Institute, Hi-Tech Park, 518057,
Shenzhen, China
| |
Collapse
|
19
|
Wang Y, Yang Y, Ma Z, Wong KC, Li X. EDCNN: identification of genome-wide RNA-binding proteins using evolutionary deep convolutional neural network. Bioinformatics 2022; 38:678-686. [PMID: 34694393 DOI: 10.1093/bioinformatics/btab739] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Revised: 10/14/2021] [Accepted: 10/20/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION RNA-binding proteins (RBPs) are a group of proteins associated with RNA regulation and metabolism, and play an essential role in mediating the maturation, transport, localization and translation of RNA. Recently, Genome-wide RNA-binding event detection methods have been developed to predict RBPs. Unfortunately, the existing computational methods usually suffer some limitations, such as high-dimensionality, data sparsity and low model performance. RESULTS Deep convolution neural network has a useful advantage for solving high-dimensional and sparse data. To improve further the performance of deep convolution neural network, we propose evolutionary deep convolutional neural network (EDCNN) to identify protein-RNA interactions by synergizing evolutionary optimization with gradient descent to enhance deep conventional neural network. In particular, EDCNN combines evolutionary algorithms and different gradient descent models in a complementary algorithm, where the gradient descent and evolution steps can alternately optimize the RNA-binding event search. To validate the performance of EDCNN, an experiment is conducted on two large-scale CLIP-seq datasets, and results reveal that EDCNN provides superior performance to other state-of-the-art methods. Furthermore, time complexity analysis, parameter analysis and motif analysis are conducted to demonstrate the effectiveness of our proposed algorithm from several perspectives. AVAILABILITY AND IMPLEMENTATION The EDCNN algorithm is available at GitHub: https://github.com/yaweiwang1232/EDCNN. Both the software and the supporting data can be downloaded from: https://figshare.com/articles/software/EDCNN/16803217. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yawei Wang
- School of Artificial Intelligence, Jilin University, Changchun, Jilin, China
| | - Yuning Yang
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Changchun, Jilin, China
| |
Collapse
|
20
|
3D Modeling of Non-coding RNA Interactions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1385:281-317. [DOI: 10.1007/978-3-031-08356-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
21
|
Marques-Pereira C, Pires M, Moreira IS. Discovery of Virus-Host interactions using bioinformatic tools. Methods Cell Biol 2022; 169:169-198. [PMID: 35623701 DOI: 10.1016/bs.mcb.2022.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
22
|
|
23
|
Li X, Zhang S, Wong KC. Multiobjective Genome-Wide RNA-Binding Event Identification From CLIP-Seq Data. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:5811-5824. [PMID: 31940583 DOI: 10.1109/tcyb.2019.2960515] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
RNA-binding proteins (RBPs) are the master regulators of mRNA processing, which are vital players for the post-transcriptional control of gene expression. In recent years, crosslinking immunoprecipitation sequencing (CLIP-seq) technologies have enabled us to sequence massive amounts of genome-wide RNA-binding event data. Its increasing availability provides opportunities to identify protein-RNA interactions on a genome-wide scale. Genome-wide RNA-binding event detection methods have been developed to the understanding of the proteins' functions within cellular processes. Unfortunately, those methods often suffer from realistic restrictions, such as high costs, intensive computation, high dimensionality, numerical instability, and data sparsity. We present a computational method [multiobjective forest algorithm (MFA)] to identify protein-RNA interactions from CLIP-seq data by synergizing multiobjective biogeography-based optimization (BBO) with random forest (RF). Since most of the tree-structured classifiers in RF are unnecessarily bulky with extra time costs and memory consumption, multiobjective BBO is designed to prune the unsuitable tree-structured classifiers dynamically. Moreover, to direct the evolution dynamics of the MFA, two objective functions are formulated to balance model generality and complexity for robust performance. To validate our MFA method, we compare its performance across 31 large-scale CLIP-seq datasets. The experimental results demonstrate that MFA can obtain superior performance over the current state-of-the-art methods. Mechanistic insights are also revealed and discussed to explore the multifaceted aspects of MFA through data source importance analysis, matrix rank estimations, seeding component perturbations, and multiobjective optimization methodology comparisons.
Collapse
|
24
|
Zhao S, Hamada M. Multi-resBind: a residual network-based multi-label classifier for in vivo RNA binding prediction and preference visualization. BMC Bioinformatics 2021; 22:554. [PMID: 34781902 PMCID: PMC8594109 DOI: 10.1186/s12859-021-04430-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 10/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein-RNA interactions play key roles in many processes regulating gene expression. To understand the underlying binding preference, ultraviolet cross-linking and immunoprecipitation (CLIP)-based methods have been used to identify the binding sites for hundreds of RNA-binding proteins (RBPs) in vivo. Using these large-scale experimental data to infer RNA binding preference and predict missing binding sites has become a great challenge. Some existing deep-learning models have demonstrated high prediction accuracy for individual RBPs. However, it remains difficult to avoid significant bias due to the experimental protocol. The DeepRiPe method was recently developed to solve this problem via introducing multi-task or multi-label learning into this field. However, this method has not reached an ideal level of prediction power due to the weak neural network architecture. RESULTS Compared to the DeepRiPe approach, our Multi-resBind method demonstrated substantial improvements using the same large-scale PAR-CLIP dataset with respect to an increase in the area under the receiver operating characteristic curve and average precision. We conducted extensive experiments to evaluate the impact of various types of input data on the final prediction accuracy. The same approach was used to evaluate the effect of loss functions. Finally, a modified integrated gradient was employed to generate attribution maps. The patterns disentangled from relative contributions according to context offer biological insights into the underlying mechanism of protein-RNA interactions. CONCLUSIONS Here, we propose Multi-resBind as a new multi-label deep-learning approach to infer protein-RNA binding preferences and predict novel interactions. The results clearly demonstrate that Multi-resBind is a promising tool to predict unknown binding sites in vivo and gain biology insights into why the neural network makes a given prediction.
Collapse
Affiliation(s)
- Shitao Zhao
- Waseda Research Institute for Science and Engineering, Waseda University, 3-4-1 Okubo Shinjuku-ku, Tokyo, 169-8555, Japan.
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 3-4-1 Okubo Shinjuku-ku, Tokyo, 169-8555, Japan. .,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, 3-4-1 Okubo Shinjuku-ku, Tokyo, 169-8555, Japan. .,Graduate School of Medicine, Nippon Medical School, 1-1-5 Sendagi, Bunkyo-ku, Tokyo, 113-8602, Japan.
| |
Collapse
|
25
|
Griesemer D, Xue JR, Reilly SK, Ulirsch JC, Kukreja K, Davis JR, Kanai M, Yang DK, Butts JC, Guney MH, Luban J, Montgomery SB, Finucane HK, Novina CD, Tewhey R, Sabeti PC. Genome-wide functional screen of 3'UTR variants uncovers causal variants for human disease and evolution. Cell 2021; 184:5247-5260.e19. [PMID: 34534445 PMCID: PMC8487971 DOI: 10.1016/j.cell.2021.08.025] [Citation(s) in RCA: 103] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 05/25/2021] [Accepted: 08/19/2021] [Indexed: 12/11/2022]
Abstract
3' untranslated region (3'UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the massively parallel reporter assay for 3'UTRs (MPRAu) to sensitively assay 12,173 3'UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3'UTR function, suggesting that simple sequences predominately explain 3'UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base pair resolution, including an adenylate-uridylate (AU)-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3'UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14 and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.
Collapse
Affiliation(s)
- Dustin Griesemer
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Department of Anesthesiology, Perioperative, and Pain Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - James R Xue
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA.
| | - Steven K Reilly
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA
| | - Jacob C Ulirsch
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Kalki Kukreja
- Department of Molecular and Cell Biology, Harvard University, Cambridge, MA 02138, USA
| | - Joe R Davis
- BigHat Biosciences, San Carlos, CA 94070, USA
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - David K Yang
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA
| | - John C Butts
- The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA
| | - Mehmet H Guney
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Jeremy Luban
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA; Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Stephen B Montgomery
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Carl D Novina
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA; Tufts University School of Medicine, Boston, MA 02111, USA
| | - Pardis C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| |
Collapse
|
26
|
Tripto E, Orenstein Y. A comparative analysis of RNA-binding proteins binding models learned from RNAcompete, RNA Bind-n-Seq and eCLIP data. Brief Bioinform 2021; 22:6278600. [PMID: 34017982 DOI: 10.1093/bib/bbab149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 03/16/2021] [Accepted: 03/27/2021] [Indexed: 11/14/2022] Open
Abstract
Understanding post-transcriptional gene regulation is a key challenge in today's biology. The new technologies of RNAcompete and RNA Bind-n-Seq enable the measurement of the binding intensities of one RNA-binding protein (RBP) to numerous synthetic RNA sequences in a single experiment. Recently, Van Nostrand et al. reported the results of RNA Bind-n-Seq experiments measuring binding of 78 human RBPs. Because 31 of these RBPs were also covered by RNAcompete technology, a large-scale comparison between implementations of these two in vitro technologies is now possible. Here, we assessed the similarities and differences between binding models, represented as a list of $k$-mer scores, inferred from RNAcompete and RNA Bind-n-Seq, and also measured how well these models predict in vivo binding. Our results show that RNA Bind-n-Seq- and RNAcompete-derived models agree (Pearson correlation $> 0.5$) for most RBPs (23 out of 31). RNA Bind-n-Seq-derived $k$-mer scores predict RNAcompete binding measurements quite well (average Pearson correlation 0.26), and both technologies produce $k$-mer scores that achieve comparable results in predicting in vivo binding (average AUC 0.7). When inspecting RNA structural preferences inferred from the data of RNA Bind-n-Seq and RNAcompete, we observed high concordance in binding preferences. Through our study, we developed a new $k$-mer score for RNA Bind-n-Seq and extended it to include RNA structural preferences.
Collapse
Affiliation(s)
- Eitamar Tripto
- Department of Biomedical Engineering at Ben-Gurion University of the Negev, Ben-Gurion, 8410501 Beer-Sheva, Israel
| | - Yaron Orenstein
- School of Electrical and Computer Engineering at Ben-Gurion University of the Negev, Ben-Gurion, 8410501 Beer-Sheva, Israel
| |
Collapse
|
27
|
Sun L, Xu K, Huang W, Yang YT, Li P, Tang L, Xiong T, Zhang QC. Predicting dynamic cellular protein-RNA interactions by deep learning using in vivo RNA structures. Cell Res 2021; 31:495-516. [PMID: 33623109 PMCID: PMC7900654 DOI: 10.1038/s41422-021-00476-y] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 01/19/2021] [Indexed: 01/31/2023] Open
Abstract
Interactions with RNA-binding proteins (RBPs) are integral to RNA function and cellular regulation, and dynamically reflect specific cellular conditions. However, presently available tools for predicting RBP-RNA interactions employ RNA sequence and/or predicted RNA structures, and therefore do not capture their condition-dependent nature. Here, after profiling transcriptome-wide in vivo RNA secondary structures in seven cell types, we developed PrismNet, a deep learning tool that integrates experimental in vivo RNA structure data and RBP binding data for matched cells to accurately predict dynamic RBP binding in various cellular conditions. PrismNet results for 168 RBPs support its utility for both understanding CLIP-seq results and largely extending such interaction data to accurately analyze additional cell types. Further, PrismNet employs an "attention" strategy to computationally identify exact RBP-binding nucleotides, and we discovered enrichment among dynamic RBP-binding sites for structure-changing variants (riboSNitches), which can link genetic diseases with dysregulated RBP bindings. Our rich profiling data and deep learning-based prediction tool provide access to a previously inaccessible layer of cell-type-specific RBP-RNA interactions, with clear utility for understanding and treating human diseases.
Collapse
Affiliation(s)
- Lei Sun
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Kui Xu
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Wenze Huang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Yucheng T Yang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Pan Li
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Lei Tang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Tuanlin Xiong
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Qiangfeng Cliff Zhang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China.
| |
Collapse
|
28
|
Koo PK, Majdandzic A, Ploenzke M, Anand P, Paul SB. Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks. PLoS Comput Biol 2021; 17:e1008925. [PMID: 33983921 PMCID: PMC8118286 DOI: 10.1371/journal.pcbi.1008925] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 03/30/2021] [Indexed: 12/15/2022] Open
Abstract
Deep neural networks have demonstrated improved performance at predicting the sequence specificities of DNA- and RNA-binding proteins compared to previous methods that rely on k-mers and position weight matrices. To gain insights into why a DNN makes a given prediction, model interpretability methods, such as attribution methods, can be employed to identify motif-like representations along a given sequence. Because explanations are given on an individual sequence basis and can vary substantially across sequences, deducing generalizable trends across the dataset and quantifying their effect size remains a challenge. Here we introduce global importance analysis (GIA), a model interpretability method that quantifies the population-level effect size that putative patterns have on model predictions. GIA provides an avenue to quantitatively test hypotheses of putative patterns and their interactions with other patterns, as well as map out specific functions the network has learned. As a case study, we demonstrate the utility of GIA on the computational task of predicting RNA-protein interactions from sequence. We first introduce a convolutional network, we call ResidualBind, and benchmark its performance against previous methods on RNAcompete data. Using GIA, we then demonstrate that in addition to sequence motifs, ResidualBind learns a model that considers the number of motifs, their spacing, and sequence context, such as RNA secondary structure and GC-bias.
Collapse
Affiliation(s)
- Peter K. Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Antonio Majdandzic
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Matthew Ploenzke
- Department of Biostatistics, Harvard University, Cambridge, Massachusetts, United States of America
| | - Praveen Anand
- Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Steffan B. Paul
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
29
|
Yang S, Liu X, Ng RT. ProbeRating: a recommender system to infer binding profiles for nucleic acid-binding proteins. Bioinformatics 2021; 36:4797-4804. [PMID: 32573679 PMCID: PMC7750938 DOI: 10.1093/bioinformatics/btaa580] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 05/18/2020] [Accepted: 06/18/2020] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The interaction between proteins and nucleic acids plays a crucial role in gene regulation and cell function. Determining the binding preferences of nucleic acid-binding proteins (NBPs), namely RNA-binding proteins (RBPs) and transcription factors (TFs), is the key to decipher the protein-nucleic acids interaction code. Today, available NBP binding data from in vivo or in vitro experiments are still limited, which leaves a large portion of NBPs uncovered. Unfortunately, existing computational methods that model the NBP binding preferences are mostly protein specific: they need the experimental data for a specific protein in interest, and thus only focus on experimentally characterized NBPs. The binding preferences of experimentally unexplored NBPs remain largely unknown. RESULTS Here, we introduce ProbeRating, a nucleic acid recommender system that utilizes techniques from deep learning and word embeddings of natural language processing. ProbeRating is developed to predict binding profiles for unexplored or poorly studied NBPs by exploiting their homologs NBPs which currently have available binding data. Requiring only sequence information as input, ProbeRating adapts FastText from Facebook AI Research to extract biological features. It then builds a neural network-based recommender system. We evaluate the performance of ProbeRating on two different tasks: one for RBP and one for TF. As a result, ProbeRating outperforms previous methods on both tasks. The results show that ProbeRating can be a useful tool to study the binding mechanism for the many NBPs that lack direct experimental evidence. and implementation. AVAILABILITY AND IMPLEMENTATION The source code is freely available at <https://github.com/syang11/ProbeRating>. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shu Yang
- Department of Computer Science, University of British Columbia, Vancouver, BC V6T1Z4, Canada
| | - Xiaoxi Liu
- RIKEN Center for Integrative Medical Sciences (IMS), Yokohama 230-0045, Japan
| | - Raymond T Ng
- Department of Computer Science, University of British Columbia, Vancouver, BC V6T1Z4, Canada
| |
Collapse
|
30
|
Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure. BMC Genomics 2020; 21:866. [PMID: 33334313 PMCID: PMC7745412 DOI: 10.1186/s12864-020-07239-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Background RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences. Results In this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets. Conclusions Our extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/. Supplementary Information The online version contains supplementary material available at (doi:10.1186/s12864-020-07239-w).
Collapse
|
31
|
Yang Y, Hou Z, Ma Z, Li X, Wong KC. iCircRBP-DHN: identification of circRNA-RBP interaction sites using deep hierarchical network. Brief Bioinform 2020; 22:5943796. [PMID: 33126261 DOI: 10.1093/bib/bbaa274] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 09/07/2020] [Accepted: 09/21/2020] [Indexed: 12/19/2022] Open
Abstract
Circular RNAs (circRNAs) are widely expressed in eukaryotes. The genome-wide interactions between circRNAs and RNA-binding proteins (RBPs) can be probed from cross-linking immunoprecipitation with sequencing data. Therefore, computational methods have been developed for identifying RBP binding sites on circRNAs. Unfortunately, those computational methods often suffer from the low discriminative power of feature representations, numerical instability and poor scalability. To address those limitations, we propose a novel computational method called iCircRBP-DHN using deep hierarchical network for discriminating circRNA-RBP binding sites. The network architecture can be regarded as a deep multi-scale residual network followed by bidirectional gated recurrent units (BiGRUs) with the self-attention mechanism, which can simultaneously extract local and global contextual information. Meanwhile, we propose novel encoding schemes by integrating CircRNA2Vec and the K-tuple nucleotide frequency pattern to represent different degrees of nucleotide dependencies. To validate the effectiveness of our proposed iCircRBP-DHN, we compared its performance with other computational methods on 37 circRNAs datasets and 31 linear RNAs datasets, respectively. The experimental results reveal that iCircRBP-DHN can achieve superior performance over those state-of-the-art algorithms. Moreover, we perform motif analysis on circRNAs bound by those different RBPs, demonstrating that our proposed CircRNA2Vec encoding scheme can be promising. The iCircRBP-DHN method is made available at https://github.com/houzl3416/iCircRBP-DHN.
Collapse
Affiliation(s)
- Yuning Yang
- School of Information Science and Technology, Northeast Normal University
| | - Zilong Hou
- School of Artificial Intelligence, Jilin University
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University
| | - Ka-Chun Wong
- School of Artificial Intelligence, Jilin University
| |
Collapse
|
32
|
Grønning AGB, Doktor TK, Larsen SJ, Petersen USS, Holm LL, Bruun GH, Hansen MB, Hartung AM, Baumbach J, Andresen BS. DeepCLIP: predicting the effect of mutations on protein-RNA binding with deep learning. Nucleic Acids Res 2020; 48:7099-7118. [PMID: 32558887 PMCID: PMC7367176 DOI: 10.1093/nar/gkaa530] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 05/11/2020] [Accepted: 06/10/2020] [Indexed: 02/07/2023] Open
Abstract
Nucleotide variants can cause functional changes by altering protein-RNA binding in various ways that are not easy to predict. This can affect processes such as splicing, nuclear shuttling, and stability of the transcript. Therefore, correct modeling of protein-RNA binding is critical when predicting the effects of sequence variations. Many RNA-binding proteins recognize a diverse set of motifs and binding is typically also dependent on the genomic context, making this task particularly challenging. Here, we present DeepCLIP, the first method for context-aware modeling and predicting protein binding to RNA nucleic acids using exclusively sequence data as input. We show that DeepCLIP outperforms existing methods for modeling RNA-protein binding. Importantly, we demonstrate that DeepCLIP predictions correlate with the functional outcomes of nucleotide variants in independent wet lab experiments. Furthermore, we show how DeepCLIP binding profiles can be used in the design of therapeutically relevant antisense oligonucleotides, and to uncover possible position-dependent regulation in a tissue-specific manner. DeepCLIP is freely available as a stand-alone application and as a webtool at http://deepclip.compbio.sdu.dk.
Collapse
Affiliation(s)
- Alexander Gulliver Bjørnholt Grønning
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark.,Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense M, Denmark
| | - Thomas Koed Doktor
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Simon Jonas Larsen
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense M, Denmark
| | - Ulrika Simone Spangsberg Petersen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Lise Lolle Holm
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Gitte Hoffmann Bruun
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Michael Birkerod Hansen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Anne-Mette Hartung
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Jan Baumbach
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense M, Denmark.,Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany
| | - Brage Storstein Andresen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| |
Collapse
|
33
|
Abstract
Deep neural networks have been revolutionizing the field of machine learning for the past several years. They have been applied with great success in many domains of the biomedical data sciences and are outperforming extant methods by a large margin. The ability of deep neural networks to pick up local image features and model the interactions between them makes them highly applicable to regulatory genomics. Instead of an image, the networks analyze DNA and RNA sequences and additional epigenomic data. In this review, we survey the successes of deep learning in the field of regulatory genomics. We first describe the fundamental building blocks of deep neural networks, popular architectures used in regulatory genomics, and their training process on molecular sequence data. We then review several key methods in different gene regulation domains. We start with the pioneering method DeepBind and its successors, which were developed to predict protein–DNA binding. We then review methods developed to predict and model epigenetic information, such as histone marks and nucleosome occupancy. Following epigenomics, we review methods to predict protein–RNA binding with its unique challenge of incorporating RNA structure information. Finally, we provide our overall view of the strengths and weaknesses of deep neural networks and prospects for future developments.
Collapse
Affiliation(s)
- Mira Barshai
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel
| | - Eitamar Tripto
- Department of Biomedical Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel
| | - Yaron Orenstein
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel
| |
Collapse
|
34
|
RNA-centric approaches to study RNA-protein interactions in vitro and in silico. Methods 2020; 178:11-18. [DOI: 10.1016/j.ymeth.2019.09.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 09/10/2019] [Accepted: 09/10/2019] [Indexed: 01/17/2023] Open
|
35
|
Sagar A, Xue B. Recent Advances in Machine Learning Based Prediction of RNA-protein Interactions. Protein Pept Lett 2019; 26:601-619. [PMID: 31215361 DOI: 10.2174/0929866526666190619103853] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 04/04/2019] [Accepted: 06/01/2019] [Indexed: 12/18/2022]
Abstract
The interactions between RNAs and proteins play critical roles in many biological processes. Therefore, characterizing these interactions becomes critical for mechanistic, biomedical, and clinical studies. Many experimental methods can be used to determine RNA-protein interactions in multiple aspects. However, due to the facts that RNA-protein interactions are tissuespecific and condition-specific, as well as these interactions are weak and frequently compete with each other, those experimental techniques can not be made full use of to discover the complete spectrum of RNA-protein interactions. To moderate these issues, continuous efforts have been devoted to developing high quality computational techniques to study the interactions between RNAs and proteins. Many important progresses have been achieved with the application of novel techniques and strategies, such as machine learning techniques. Especially, with the development and application of CLIP techniques, more and more experimental data on RNA-protein interaction under specific biological conditions are available. These CLIP data altogether provide a rich source for developing advanced machine learning predictors. In this review, recent progresses on computational predictors for RNA-protein interaction were summarized in the following aspects: dataset, prediction strategies, and input features. Possible future developments were also discussed at the end of the review.
Collapse
Affiliation(s)
- Amit Sagar
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| |
Collapse
|
36
|
Su Y, Luo Y, Zhao X, Liu Y, Peng J. Integrating thermodynamic and sequence contexts improves protein-RNA binding prediction. PLoS Comput Biol 2019; 15:e1007283. [PMID: 31483777 PMCID: PMC6752863 DOI: 10.1371/journal.pcbi.1007283] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 09/19/2019] [Accepted: 07/24/2019] [Indexed: 11/23/2022] Open
Abstract
Predicting RNA-binding protein (RBP) specificity is important for understanding gene expression regulation and RNA-mediated enzymatic processes. It is widely believed that RBP binding specificity is determined by both the sequence and structural contexts of RNAs. Existing approaches, including traditional machine learning algorithms and more recently, deep learning models, have been extensively applied to integrate RNA sequence and its predicted or experimental RNA structural probabilities for improving the accuracy of RBP binding prediction. Such models were trained mostly on the large-scale in vitro datasets, such as the RNAcompete dataset. However, in RNAcompete, most synthetic RNAs are unstructured, which makes machine learning methods not effectively extract RBP-binding structural preferences. Furthermore, RNA structure may be variable or multi-modal according to both theoretical and experimental evidence. In this work, we propose ThermoNet, a thermodynamic prediction model by integrating a new sequence-embedding convolutional neural network model over a thermodynamic ensemble of RNA secondary structures. First, the sequence-embedding convolutional neural network generalizes the existing k-mer based methods by jointly learning convolutional filters and k-mer embeddings to represent RNA sequence contexts. Second, the thermodynamic average of deep-learning predictions is able to explore structural variability and improves the prediction, especially for the structured RNAs. Extensive experiments demonstrate that our method significantly outperforms existing approaches, including RCK, DeepBind and several other recent state-of-the-art methods for predictions on both in vitro and in vivo data. The implementation of ThermoNet is available at https://github.com/suyufeng/ThermoNet. RNA-binding proteins (RBPs) play a key role in modulating various cellular processes, including transcription, alternative splicing, and translational regulation. Identifying protein-RNA interactions and the binding preferences of RBPs are critical to unraveling the mechanism of post-transcriptional gene regulation. In the current study, we present a computational approach that integrates both structure and sequence contexts for protein-RNA binding prediction. We propose to incorporate the structure information using a thermodynamic ensemble of secondary structures, which effectively identifies RBP-binding structural preferences, especially for structured RNAs. Our model is further empowered by a deep neural network that combines the sequence and structure information to achieve improved protein-RNA binding prediction. Extensive experiments on both in vitro and in vivo datasets demonstrate the superior performance of our method compared to several state-of-the-art approaches. This study suggests the great potential of our method as a practical tool for identifying novel protein-RNA interactions and binding sites of RBPs.
Collapse
Affiliation(s)
- Yufeng Su
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yunan Luo
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xiaoming Zhao
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Yang Liu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
37
|
Polishchuk M, Paz I, Yakhini Z, Mandel-Gutfreund Y. SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data. Nucleic Acids Res 2019; 46:W221-W228. [PMID: 29800452 PMCID: PMC6030986 DOI: 10.1093/nar/gky453] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 05/13/2018] [Indexed: 01/24/2023] Open
Abstract
Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
Collapse
Affiliation(s)
- Maya Polishchuk
- Department of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel.,Vavilov Institute of General Genetics, Russian Academy of Science, 11933 Moscow, Russia
| | - Inbal Paz
- Department of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Zohar Yakhini
- School of Computer Science, Herzliya Interdisciplinary Center, Herzliya 46150, Israel.,Department of Computer Science, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Yael Mandel-Gutfreund
- Department of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel.,Department of Computer Science, Technion-Israel Institute of Technology, Haifa 32000, Israel
| |
Collapse
|
38
|
Kinney JB, McCandlish DM. Massively Parallel Assays and Quantitative Sequence-Function Relationships. Annu Rev Genomics Hum Genet 2019; 20:99-127. [PMID: 31091417 DOI: 10.1146/annurev-genom-083118-014845] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Over the last decade, a rich variety of massively parallel assays have revolutionized our understanding of how biological sequences encode quantitative molecular phenotypes. These assays include deep mutational scanning, high-throughput SELEX, and massively parallel reporter assays. Here, we review these experimental methods and how the data they produce can be used to quantitatively model sequence-function relationships. In doing so, we touch on a diverse range of topics, including the identification of clinically relevant genomic variants, the modeling of transcription factor binding to DNA, the functional and evolutionary landscapes of proteins, and cis-regulatory mechanisms in both transcription and mRNA splicing. We further describe a unified conceptual framework and a core set of mathematical modeling strategies that studies in these diverse areas can make use of. Finally, we highlight key aspects of experimental design and mathematical modeling that are important for the results of such studies to be interpretable and reproducible.
Collapse
Affiliation(s)
- Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| |
Collapse
|
39
|
Pan X, Yang Y, Xia C, Mirza AH, Shen H. Recent methodology progress of deep learning for RNA–protein interaction prediction. WILEY INTERDISCIPLINARY REVIEWS-RNA 2019; 10:e1544. [DOI: 10.1002/wrna.1544] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 04/07/2019] [Accepted: 04/11/2019] [Indexed: 12/17/2022]
Affiliation(s)
- Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai China
- IDLab, Department for Electronics and Information Systems Ghent University Ghent Belgium
- BASF Agriculture Solution Ghent Belgium
| | - Yang Yang
- Department of Computer Science Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Shanghai China
| | - Chun‐Qiu Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai China
| | - Aashiq H. Mirza
- Department of Pharmacology Weill Cornell Medicine New York New York
| | - Hong‐Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai China
- Department of Computer Science Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Shanghai China
| |
Collapse
|
40
|
Bioinformatics Approaches to Gain Insights into cis-Regulatory Motifs Involved in mRNA Localization. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1203:165-194. [PMID: 31811635 DOI: 10.1007/978-3-030-31434-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Messenger RNA (mRNA) is a fundamental intermediate in the expression of proteins. As an integral part of this important process, protein production can be localized by the targeting of mRNA to a specific subcellular compartment. The subcellular destination of mRNA is suggested to be governed by a region of its primary sequence or secondary structure, which consequently dictates the recruitment of trans-acting factors, such as RNA-binding proteins or regulatory RNAs, to form a messenger ribonucleoprotein particle. This molecular ensemble is requisite for precise and spatiotemporal control of gene expression. In the context of RNA localization, the description of the binding preferences of an RNA-binding protein defines a motif, and one, or more, instance of a given motif is defined as a localization element (zip code). In this chapter, we first discuss the cis-regulatory motifs previously identified as mRNA localization elements. We then describe motif representation in terms of entropy and information content and offer an overview of motif databases and search algorithms. Finally, we provide an outline of the motif topology of asymmetrically localized mRNA molecules.
Collapse
|
41
|
Ben-Bassat I, Chor B, Orenstein Y. A deep neural network approach for learning intrinsic protein-RNA binding preferences. Bioinformatics 2018; 34:i638-i646. [DOI: 10.1093/bioinformatics/bty600] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ilan Ben-Bassat
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Benny Chor
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Yaron Orenstein
- Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| |
Collapse
|
42
|
Kulandaisamy A, Srivastava A, Kumar P, Nagarajan R, Priya SB, Gromiha MM. Identification and Analysis of Key Residues in Protein-RNA Complexes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1436-1444. [PMID: 29993582 DOI: 10.1109/tcbb.2018.2834387] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Protein-RNA complexes play important roles in various biological processes. The functions of protein-RNA complexes are dictated by their interactions, binding, stability, and affinity. In this work, we have identified the key residues (KRs), which are involved in both stability and binding. We found that 42 percent of considered proteins share common binding and stabilizing residues, whereas these residues are distinct in 58 percent of the proteins. Overall, 5 percent of stabilizing and 3 percent of binding residues serve as key residues. These residues are enriched with the combination of polar, charged, aliphatic, and aromatic residues. Analysis on subclasses of protein-RNA complexes based on protein structural class, function and RNA type showed that regulatory proteins, and complexes with single stranded RNA and rRNA have appreciable number of key residues. Specifically, Arg, Tyr, and Thr are preferred in most of the subclasses of protein-RNA complexes. In addition, residues with similar chemical behavior have different preferences to be KRs, such that Arg, Tyr, Val, and Thr are preferred over Lys, Trp, Ile, and Ser, respectively. Atomic level contacts revealed that charged and polar-nonpolar contacts are dominant in enzymes, polar in structural, and nonpolar in regulatory proteins. On the other hand, polar-nonpolar contacts are enriched in all these classes of protein-RNA complexes. Further, the influence of sequence and structural features such as conservation score, surrounding hydrophobicity, solvent accessibility, secondary structure, and long-range order in key residues are also discussed. We envisage that the present study provides insights to understand the structural and functional aspects of protein-RNA complexes.
Collapse
|
43
|
Zagrovic B, Bartonek L, Polyansky AA. RNA-protein interactions in an unstructured context. FEBS Lett 2018; 592:2901-2916. [PMID: 29851074 PMCID: PMC6175095 DOI: 10.1002/1873-3468.13116] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 05/12/2018] [Accepted: 05/13/2018] [Indexed: 02/02/2023]
Abstract
Despite their importance, our understanding of noncovalent RNA-protein interactions is incomplete. This especially concerns the binding between RNA and unstructured protein regions, a widespread class of such interactions. Here, we review the recent experimental and computational work on RNA-protein interactions in an unstructured context with a particular focus on how such interactions may be shaped by the intrinsic interaction affinities between individual nucleobases and protein side chains. Specifically, we articulate the claim that the universal genetic code reflects the binding specificity between nucleobases and protein side chains and that, in turn, the code may be seen as the Rosetta stone for understanding RNA-protein interactions in general.
Collapse
Affiliation(s)
- Bojan Zagrovic
- Department of Structural and Computational BiologyMax F. Perutz LaboratoriesUniversity of ViennaAustria
| | - Lukas Bartonek
- Department of Structural and Computational BiologyMax F. Perutz LaboratoriesUniversity of ViennaAustria
| | - Anton A. Polyansky
- Department of Structural and Computational BiologyMax F. Perutz LaboratoriesUniversity of ViennaAustria,MM Shemyakin and Yu A Ovchinnikov Institute of Bioorganic ChemistryRussian Academy of SciencesMoscowRussia
| |
Collapse
|
44
|
Sasse A, Laverty KU, Hughes TR, Morris QD. Motif models for RNA-binding proteins. Curr Opin Struct Biol 2018; 53:115-123. [PMID: 30172081 DOI: 10.1016/j.sbi.2018.08.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 08/07/2018] [Indexed: 01/24/2023]
Abstract
Identifying the binding preferences of RNA-binding proteins (RBPs) is important in understanding their contribution to post-transcriptional regulation. Here, we review the current state-of-the art of RNA motif identification tools for RBPs. New in vivo and in vitro data sets provide sufficient statistical power to enable detection of relatively long and complex sequence and sequence-structure binding preferences, and recent computational methods are geared towards quantitative identification of these patterns. We classify methods by their motif model's representational power and describe the underlying considerations for RNA-protein interactions. All classical motif identification algorithms apply physically motivated architectures, consisting of a motif and an occupancy model, we call these explicit motif models. Recent methods, such as convolutional neural networks and support vector machines, abandon the classical architecture and implicitly model RNA binding without defining a motif model. Although they achieve high accuracy on held-out data they may be unsuitable to solve the ultimate goal of the field, using motifs trained on in vitro data to predict in vivo binding sites. For this task methods need to separate intrinsic binding preferences from cellular effects from protein and RNA concentrations, cooperativity, and competition. To tackle this problem, we advocate for the use of a `three-layer' architecture, consisting of motif model, occupancy model, and extrinsic factor model, which enables separation and adjustment to cellular conditions.
Collapse
Affiliation(s)
- Alexander Sasse
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Kaitlin U Laverty
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Canadian Institute for Advanced Research, MaRS Centre, West Tower, 661 University Avenue, Suite 505, Toronto, ON M5G 1M1, Canada
| | - Quaid D Morris
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5T 3A1, Canada
| |
Collapse
|
45
|
Pan X, Shen HB. Learning distributed representations of RNA sequences and its application for predicting RNA-protein binding sites with a convolutional neural network. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.04.036] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
46
|
Yang S, Wang J, Ng RT. Inferring RNA sequence preferences for poorly studied RNA-binding proteins based on co-evolution. BMC Bioinformatics 2018. [PMID: 29529991 PMCID: PMC5848454 DOI: 10.1186/s12859-018-2091-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Background Characterizing the binding preference of RNA-binding proteins (RBP) is essential for us to understand the interaction between an RBP and its RNA targets, and to decipher the mechanism of post-transcriptional regulation. Experimental methods have been used to generate protein-RNA binding data for a number of RBPs in vivo and in vitro. Utilizing the binding data, a couple of computational methods have been developed to detect the RNA sequence or structure preferences of the RBPs. However, the majority of RBPs have not yet been experimentally characterized and lack RNA binding data. For these poorly studied RBPs, the identification of their binding preferences cannot be performed by most existing computational methods because the experimental binding data are prerequisite to these methods. Results Here we propose a new method based on co-evolution to predict the sequence preferences for the poorly studied RBPs, waiving the requirement of their binding data. First, we demonstrate the co-evolutionary relationship between RBPs and their RNA partners. We then present a K-nearest neighbors (KNN) based algorithm to infer the sequence preference of an RBP using only the preference information from its homologous RBPs. By benchmarking against several in vitro and in vivo datasets, our proposed method outperforms the existing alternative which uses the closest neighbor’s preference on all the datasets. Moreover, it shows comparable performance with two state-of-the-art methods that require the presence of the experimental binding data. Finally, we demonstrate the usage of this method to infer sequence preferences for novel proteins which have no binding preference information available. Conclusion For a poorly studied RBP, the current methods used to determine its binding preference need experimental data, which is expensive and time consuming. Therefore, determining RBP’s preference is not practical in many situations. This study provides an economic solution to infer the sequence preference of such protein based on the co-evolution. The source codes and related datasets are available at https://github.com/syang11/KNN. Electronic supplementary material The online version of this article (10.1186/s12859-018-2091-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shu Yang
- Department of Computer Science, University of British Columbia, Vancouver, Canada.
| | - Junwen Wang
- Department of Health Sciences Research, Mayo Clinic Arizona, Scottsdale, USA
| | - Raymond T Ng
- Department of Computer Science, University of British Columbia, Vancouver, Canada
| |
Collapse
|
47
|
Orenstein Y, Ohler U, Berger B. Finding RNA structure in the unstructured RBPome. BMC Genomics 2018; 19:154. [PMID: 29463232 PMCID: PMC5819699 DOI: 10.1186/s12864-018-4540-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 02/13/2018] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND RNA-binding proteins (RBPs) play vital roles in many processes in the cell. Different RBPs bind RNA with different sequence and structure specificities. While sequence specificities for a large set of 205 RBPs have been reported through the RNAcompete compendium, structure specificities are known for only a small fraction. The main limitation lies in the design of the RNAcompete technology, which tests RBP binding against unstructured RNA probes, making it difficult to infer structural preferences from these data. We recently developed RCK, an algorithm to infer sequence and structural binding models from RNAcompete data. The set of binding models enables, for the first time, a large-scale assessment of RNA structure in the RBPome. RESULTS We re-validate and uncover the role of RNA structure in the RPBome through novel analysis of the largest-scale dataset to date. First, we show that RNA structure exists in presumably unstructured RNA probes and that its variability is correlated with RNA-binding. Second, we examine the structural binding preferences of RBPs and discover an overall preference to bind RNA loops. Third, we significantly improve protein-binding prediction using RNA structure, both in vitro and in vivo. Lastly, we demonstrate that RNA structural binding preferences can be inferred for new proteins from solely their amino acid content. CONCLUSIONS By counter-intuitively demonstrating through our analysis that we can predict both the RNA structure of and RBP binding to these putatively unstructured RNAs, we transform a compendium of RNA-binding proteins into a valuable resource for structure-based binding models. We uncover the important role RNA structure plays in protein-RNA interaction for hundreds of RNA-binding proteins.
Collapse
Affiliation(s)
- Yaron Orenstein
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA USA
| | - Uwe Ohler
- Max Delbruck Center for Molecular Medicine, -Buch, Berlin, Germany
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA USA
- Mathematics Department, MIT, Cambridge, USA
| |
Collapse
|
48
|
Hamada M. In silico approaches to RNA aptamer design. Biochimie 2017; 145:8-14. [PMID: 29032056 DOI: 10.1016/j.biochi.2017.10.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 10/09/2017] [Indexed: 10/18/2022]
Abstract
RNA aptamers are ribonucleic acids that bind to specific target molecules. An RNA aptamer for a disease-related protein has great potential for development into a new drug. However, huge time and cost investments are required to develop an RNA aptamer into a pharmaceutical. Recently, SELEX combined with high-throughput sequencers (i.e., HT-SELEX) has been widely used to select candidate RNA aptamers that bind to a target protein with high affinity and specificity. After candidate selection, further optimizations such as shortening and modifying candidate sequences are performed. In these steps, in silico approaches are expected to reduce the time and cost associated with aptamer drug development. In this article, we review existing in silico approaches to RNA aptamer development, including a method for ranking the candidates of RNA aptamers from HT-SELEX data, clustering a huge number of aptamer sequences, and finding motifs amidst a set of significant RNA aptamers. It is expected that further studies in addition to these methods will be utilized for in silico RNA aptamer design, permitting a minimal number of experiments to be performed through the utilization of sophisticated computational methods.
Collapse
Affiliation(s)
- Michiaki Hamada
- Bioinformatics Laboratory, Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan; Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 63-520, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan; Institute for Medical-oriented Structural Biology, Waseda University, 2-2, Wakamatsu-cho Shinjuku-ku, Tokyo 162-8480, Japan; Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26, Aomi, Koto-ku, Tokyo 135-0064, Japan; Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo 113-8602, Japan.
| |
Collapse
|
49
|
Cook KB, Vembu S, Ha KCH, Zheng H, Laverty KU, Hughes TR, Ray D, Morris QD. RNAcompete-S: Combined RNA sequence/structure preferences for RNA binding proteins derived from a single-step in vitro selection. Methods 2017; 126:18-28. [PMID: 28651966 DOI: 10.1016/j.ymeth.2017.06.024] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Revised: 06/16/2017] [Accepted: 06/21/2017] [Indexed: 12/15/2022] Open
Abstract
RNA-binding proteins recognize RNA sequences and structures, but there is currently no systematic and accurate method to derive large (>12base) motifs de novo that reflect a combination of intrinsic preference to both sequence and structure. To address this absence, we introduce RNAcompete-S, which couples a single-step competitive binding reaction with an excess of random RNA 40-mers to a custom computational pipeline for interrogation of the bound RNA sequences and derivation of SSMs (Sequence and Structure Models). RNAcompete-S confirms that HuR, QKI, and SRSF1 prefer binding sites that are single stranded, and recapitulates known 8-10bp sequence and structure preferences for Vts1p and RBMY. We also derive an 18-base long SSM for Drosophila SLBP, which to our knowledge has not been previously determined by selections from pure random sequence, and accurately discriminates human replication-dependent histone mRNAs. Thus, RNAcompete-S enables accurate identification of large, intrinsic sequence-structure specificities with a uniform assay.
Collapse
Affiliation(s)
- Kate B Cook
- Department of Molecular Genetics, University of Toronto, Toronto M5S 1A8, Canada
| | - Shankar Vembu
- Donnelly Centre, University of Toronto, Toronto M5S 3E1, Canada
| | - Kevin C H Ha
- Department of Molecular Genetics, University of Toronto, Toronto M5S 1A8, Canada
| | - Hong Zheng
- Donnelly Centre, University of Toronto, Toronto M5S 3E1, Canada
| | - Kaitlin U Laverty
- Department of Molecular Genetics, University of Toronto, Toronto M5S 1A8, Canada
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto M5S 1A8, Canada; Donnelly Centre, University of Toronto, Toronto M5S 3E1, Canada.
| | - Debashish Ray
- Donnelly Centre, University of Toronto, Toronto M5S 3E1, Canada.
| | - Quaid D Morris
- Department of Molecular Genetics, University of Toronto, Toronto M5S 1A8, Canada; Donnelly Centre, University of Toronto, Toronto M5S 3E1, Canada; Department of Computer Science, University of Toronto, Toronto M5S 2E4, Canada; Department of Electrical and Computer Engineering, University of Toronto, Toronto M5S 3G4, Canada.
| |
Collapse
|
50
|
Pei S, Slinger BL, Meyer MM. Recognizing RNA structural motifs in HT-SELEX data for ribosomal protein S15. BMC Bioinformatics 2017; 18:298. [PMID: 28587636 PMCID: PMC5461778 DOI: 10.1186/s12859-017-1704-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Accepted: 05/22/2017] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND Proteins recognize many different aspects of RNA ranging from single stranded regions to discrete secondary or tertiary structures. High-throughput sequencing (HTS) of in vitro selected populations offers a large scale method to study RNA-proteins interactions. However, most existing analysis methods require that the binding motifs are enriched in the population relative to earlier rounds, and that motifs are found in a loop or single stranded region of the potential RNA secondary structure. Such methods do not generalize to all RNA-protein interaction as some RNA binding proteins specifically recognize more complex structures such as double stranded RNA. RESULTS In this study, we use HT-SELEX derived populations to study the landscape of RNAs that interact with Geobacillus kaustophilus ribosomal protein S15. Our data show high sequence and structure diversity and proved intractable to existing methods. Conventional programs identified some sequence motifs, but these are found in less than 5-10% of the total sequence pool. Therefore, we developed a novel framework to analyze HT-SELEX data. Our process accounts for both sequence and structure components by abstracting the overall secondary structure into smaller substructures composed of a single base-pair stack, which allows us to leverage existing approaches already used in k-mer analysis to identify enriched motifs. By focusing on secondary structure motifs composed of specific two base-pair stacks, we identified significantly enriched or depleted structure motifs relative to earlier rounds. CONCLUSIONS Discrete substructures are likely to be important to RNA-protein interactions, but they are difficult to elucidate. Substructures can help make highly diverse sequence data more tractable. The structure motifs provide limited accuracy in predicting enrichment suggesting that G. kaustophilus S15 can either recognize many different secondary structure motifs or some aspects of the interaction are not captured by the analysis. This highlights the importance of considering secondary and tertiary structure elements and their role in RNA-protein interactions.
Collapse
Affiliation(s)
- Shermin Pei
- Boston College, 140 Commonwealth Ave., 02467, Chestnut Hill, USA
| | - Betty L Slinger
- Boston College, 140 Commonwealth Ave., 02467, Chestnut Hill, USA
| | - Michelle M Meyer
- Boston College, 140 Commonwealth Ave., 02467, Chestnut Hill, USA.
| |
Collapse
|