Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Orenstein Y, Wang Y, Berger B. RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data. Bioinformatics 2016;32:i351-i359. [PMID: 27307637 PMCID: PMC4908343 DOI: 10.1093/bioinformatics/btw259] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

For:	Orenstein Y, Wang Y, Berger B. RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data. Bioinformatics 2016;32:i351-i359. [PMID: 27307637 PMCID: PMC4908343 DOI: 10.1093/bioinformatics/btw259] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Number

Cited by Other Article(s)

Qiao Y, Yang R, Liu Y, Chen J, Zhao L, Huo P, Wang Z, Bu D, Wu Y, Zhao Y. DeepFusion: A deep bimodal information fusion network for unraveling protein-RNA interactions using in vivo RNA structures. Comput Struct Biotechnol J 2024;23:617-625. [PMID: 38274994 PMCID: PMC10808905 DOI: 10.1016/j.csbj.2023.12.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/04/2023] [Accepted: 12/26/2023] [Indexed: 01/27/2024] Open

Sasse A, Ray D, Laverty KU, Tam CL, Albu M, Zheng H, Lyudovyk O, Dalal T, Nie K, Magis C, Notredame C, Weirauch MT, Hughes TR, Morris Q. Reconstructing the sequence specificities of RNA-binding proteins across eukaryotes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.15.618476. [PMID: 39464061 PMCID: PMC11507768 DOI: 10.1101/2024.10.15.618476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]

Affiliation(s)

Alexander Sasse Department of Molecular Genetics, University of Toronto, Toronto, ON Canada Donnelly Centre, University of Toronto, Toronto, ON Canada Department of Computer Science, University of Washington, Seattle, WA, USA Vector Institute, Toronto, ON Canada
Debashish Ray Donnelly Centre, University of Toronto, Toronto, ON Canada
Kaitlin U Laverty Department of Molecular Genetics, University of Toronto, Toronto, ON Canada Donnelly Centre, University of Toronto, Toronto, ON Canada Vector Institute, Toronto, ON Canada Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Cyrus L Tam Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
Mihai Albu Donnelly Centre, University of Toronto, Toronto, ON Canada
Hong Zheng Donnelly Centre, University of Toronto, Toronto, ON Canada
Olga Lyudovyk Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
Taykhoom Dalal Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
Kate Nie Department of Molecular Genetics, University of Toronto, Toronto, ON Canada Donnelly Centre, University of Toronto, Toronto, ON Canada Vector Institute, Toronto, ON Canada
Cedrik Magis Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain Universitat Pompeu Fabra, Barcelona, Spain
Cedric Notredame Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain Universitat Pompeu Fabra, Barcelona, Spain
Matthew T Weirauch Center for Autoimmune Genomics and Etiology, Divisions of Allergy & Immunology, Human Genetics, Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital, Cincinnati, OH, USA Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
Timothy R Hughes Department of Molecular Genetics, University of Toronto, Toronto, ON Canada Donnelly Centre, University of Toronto, Toronto, ON Canada
Quaid Morris Department of Molecular Genetics, University of Toronto, Toronto, ON Canada Donnelly Centre, University of Toronto, Toronto, ON Canada Vector Institute, Toronto, ON Canada Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA Ontario Institute for Cancer Research, Toronto, ON, Canada

Collapse

Yuan L, Zhao L, Lai J, Jiang Y, Zhang Q, Shen Z, Zheng CH, Huang DS. iCRBP-LKHA: Large convolutional kernel and hybrid channel-spatial attention for identifying circRNA-RBP interaction sites. PLoS Comput Biol 2024;20:e1012399. [PMID: 39173070 PMCID: PMC11373821 DOI: 10.1371/journal.pcbi.1012399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 09/04/2024] [Accepted: 08/08/2024] [Indexed: 08/24/2024] Open

Affiliation(s)

Lin Yuan Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
Ling Zhao Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
Jinling Lai Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
Yufeng Jiang Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
Qinhu Zhang Eastern Institute for Advanced Study, Eastern Institute of Technology, Ningbo, China
Zhen Shen School of Computer and Software, Nanyang Institute of Technology, Nanyang, China
Chun-Hou Zheng Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
De-Shuang Huang Eastern Institute for Advanced Study, Eastern Institute of Technology, Ningbo, China

Collapse

Lasantha D, Vidanagamachchi S, Nallaperuma S. CRIECNN: Ensemble convolutional neural network and advanced feature extraction methods for the precise forecasting of circRNA-RBP binding sites. Comput Biol Med 2024;174:108466. [PMID: 38615462 DOI: 10.1016/j.compbiomed.2024.108466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 03/29/2024] [Accepted: 04/08/2024] [Indexed: 04/16/2024]

Cao C, Wang C, Yang S, Zou Q. CircSI-SSL: circRNA-binding site identification based on self-supervised learning. Bioinformatics 2024;40:btae004. [PMID: 38180876 PMCID: PMC10789309 DOI: 10.1093/bioinformatics/btae004] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 11/13/2023] [Accepted: 01/03/2024] [Indexed: 01/07/2024] Open

Abstract

MOTIVATION

In recent years, circular RNAs (circRNAs), the particular form of RNA with a closed-loop structure, have attracted widespread attention due to their physiological significance (they can directly bind proteins), leading to the development of numerous protein site identification algorithms. Unfortunately, these studies are supervised and require the vast majority of labeled samples in training to produce superior performance. But the acquisition of sample labels requires a large number of biological experiments and is difficult to obtain.

RESULTS

To resolve this matter that a great deal of tags need to be trained in the circRNA-binding site prediction task, a self-supervised learning binding site identification algorithm named CircSI-SSL is proposed in this article. According to the survey, this is unprecedented in the research field. Specifically, CircSI-SSL initially combines multiple feature coding schemes and employs RNA_Transformer for cross-view sequence prediction (self-supervised task) to learn mutual information from the multi-view data, and then fine-tuning with only a few sample labels. Comprehensive experiments on six widely used circRNA datasets indicate that our CircSI-SSL algorithm achieves excellent performance in comparison to previous algorithms, even in the extreme case where the ratio of training data to test data is 1:9. In addition, the transplantation experiment of six linRNA datasets without network modification and hyperparameter adjustment shows that CircSI-SSL has good scalability. In summary, the prediction algorithm based on self-supervised learning proposed in this article is expected to replace previous supervised algorithms and has more extensive application value.

AVAILABILITY AND IMPLEMENTATION

The source code and data are available at https://github.com/cc646201081/CircSI-SSL.

Collapse

Akbari Rokn Abadi S, Tabatabaei S, Koohi S. KDeep: a new memory-efficient data extraction method for accurately predicting DNA/RNA transcription factor binding sites. J Transl Med 2023;21:727. [PMID: 37845681 PMCID: PMC10580661 DOI: 10.1186/s12967-023-04593-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 10/04/2023] [Indexed: 10/18/2023] Open

Pan Z, Zhou S, Liu T, Liu C, Zang M, Wang Q. WVDL: Weighted Voting Deep Learning Model for Predicting RNA-Protein Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:3322-3328. [PMID: 37028092 DOI: 10.1109/tcbb.2023.3252276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]

Cao C, Yang S, Li M, Li C. CircSSNN: circRNA-binding site prediction via sequence self-attention neural networks with pre-normalization. BMC Bioinformatics 2023;24:220. [PMID: 37254080 DOI: 10.1186/s12859-023-05352-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 05/25/2023] [Indexed: 06/01/2023] Open

Abstract

BACKGROUND

Circular RNAs (circRNAs) play a significant role in some diseases by acting as transcription templates. Therefore, analyzing the interaction mechanism between circRNA and RNA-binding proteins (RBPs) has far-reaching implications for the prevention and treatment of diseases. Existing models for circRNA-RBP identification usually adopt convolution neural network (CNN), recurrent neural network (RNN), or their variants as feature extractors. Most of them have drawbacks such as poor parallelism, insufficient stability, and inability to capture long-term dependencies.

METHODS

In this paper, we propose a new method completely using the self-attention mechanism to capture deep semantic features of RNA sequences. On this basis, we construct a CircSSNN model for the cirRNA-RBP identification. The proposed model constructs a feature scheme by fusing circRNA sequence representations with statistical distributions, static local contexts, and dynamic global contexts. With a stable and efficient network architecture, the distance between any two positions in a sequence is reduced to a constant, so CircSSNN can quickly capture the long-term dependencies and extract the deep semantic features.

RESULTS

Experiments on 37 circRNA datasets show that the proposed model has overall advantages in stability, parallelism, and prediction performance. Keeping the network structure and hyperparameters unchanged, we directly apply the CircSSNN to linRNA datasets. The favorable results show that CircSSNN can be transformed simply and efficiently without task-oriented tuning.

CONCLUSIONS

In conclusion, CircSSNN can serve as an appealing circRNA-RBP identification tool with good identification performance, excellent scalability, and wide application scope without the need for task-oriented fine-tuning of parameters, which is expected to reduce the professional threshold required for hyperparameter tuning in bioinformatics analysis.

Collapse

Xu Y, Zhu J, Huang W, Xu K, Yang R, Zhang QC, Sun L. PrismNet: predicting protein-RNA interaction using in vivo RNA structural information. Nucleic Acids Res 2023:7151359. [PMID: 37140045 DOI: 10.1093/nar/gkad353] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/13/2023] [Accepted: 04/26/2023] [Indexed: 05/05/2023] Open

Affiliation(s)

Yiran Xu MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Jianghui Zhu MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Wenze Huang MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Kui Xu MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Rui Yang MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Qiangfeng Cliff Zhang MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Lei Sun Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Shandong University, Qingdao 266237, China

Collapse

Wang X, Zhang M, Long C, Yao L, Zhu M. Self-Attention Based Neural Network for Predicting RNA-Protein Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:1469-1479. [PMID: 36067103 DOI: 10.1109/tcbb.2022.3204661] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Koo PK, Ploenzke M, Anand P, Paul S, Majdandzic A. ResidualBind: Uncovering Sequence-Structure Preferences of RNA-Binding Proteins with Deep Neural Networks. Methods Mol Biol 2023;2586:197-215. [PMID: 36705906 DOI: 10.1007/978-1-0716-2768-6_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Huang W, Zhang QC. Prediction of Dynamic RBP-RNA Interactions Using PrismNet. Methods Mol Biol 2023;2568:123-132. [PMID: 36227565 DOI: 10.1007/978-1-0716-2687-0_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]

Bheemireddy S, Sandhya S, Srinivasan N, Sowdhamini R. Computational tools to study RNA-protein complexes. Front Mol Biosci 2022;9:954926. [PMID: 36275618 PMCID: PMC9585174 DOI: 10.3389/fmolb.2022.954926] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 09/20/2022] [Indexed: 11/19/2022] Open

Pepe G, Appierdo R, Carrino C, Ballesio F, Helmer-Citterich M, Gherardini PF. Artificial intelligence methods enhance the discovery of RNA interactions. Front Mol Biosci 2022;9:1000205. [PMID: 36275611 PMCID: PMC9585310 DOI: 10.3389/fmolb.2022.1000205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open

Laverty KU, Jolma A, Pour SE, Zheng H, Ray D, Morris Q, Hughes TR. PRIESSTESS: interpretable, high-performing models of the sequence and structure preferences of RNA-binding proteins. Nucleic Acids Res 2022;50:e111. [PMID: 36018788 DOI: 10.1093/nar/gkac694] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 07/22/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022] Open

Ma H, Wen H, Xue Z, Li G, Zhang Z. RNANetMotif: Identifying sequence-structure RNA network motifs in RNA-protein binding sites. PLoS Comput Biol 2022;18:e1010293. [PMID: 35819951 PMCID: PMC9275694 DOI: 10.1371/journal.pcbi.1010293] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 06/09/2022] [Indexed: 11/19/2022] Open

Abstract

RNA molecules can adopt stable secondary and tertiary structures, which are essential in mediating physical interactions with other partners such as RNA binding proteins (RBPs) and in carrying out their cellular functions. In vivo and in vitro experiments such as RNAcompete and eCLIP have revealed in vitro binding preferences of RBPs to RNA oligomers and in vivo binding sites in cells. Analysis of these binding data showed that the structure properties of the RNAs in these binding sites are important determinants of the binding events; however, it has been a challenge to incorporate the structure information into an interpretable model. Here we describe a new approach, RNANetMotif, which takes predicted secondary structure of thousands of RNA sequences bound by an RBP as input and uses a graph theory approach to recognize enriched subgraphs. These enriched subgraphs are in essence shared sequence-structure elements that are important in RBP-RNA binding. To validate our approach, we performed RNA structure modeling via coarse-grained molecular dynamics folding simulations for selected 4 RBPs, and RNA-protein docking for LIN28B. The simulation results, e.g., solvent accessibility and energetics, further support the biological relevance of the discovered network subgraphs.

RNA binding proteins (RBPs) regulate every aspect of RNA biology, including splicing, translation, transportation, and degradation. High-throughput technologies such as eCLIP have identified thousands of binding sites for a given RBP throughout the genome. It has been shown by earlier studies that, in addition to nucleotide sequences, the structure and conformation of RNAs also play important role in RBP-RNA interactions. Analogous to protein-protein interactions or protein-DNA interactions, it is likely that there exist intrinsic sequence-structure motifs common to these RNAs that underlie their binding specificity to specific RBPs. It is known that RNAs form energetically favorable secondary structures, which can be represented as graphs, with nucleotides being nodes and backbone covalent bonds and base-pairing hydrogen bonds representing edges. We hypothesize that these graphs can be mined by graph theory approaches to identify sequence-structure motifs as enriched sub-graphs. In this article, we described the details of this approach, termed RNANetMotif and associated new concepts, namely EKS (Extended K-mer Subgraph) and GraphK graph algorithm. To test the utility of our approach, we conducted 3D structure modeling of selected RNA sequences through molecular dynamics (MD) folding simulation and evaluated the significance of the discovered RNA motifs by comparing their spatial exposure with other regions on the RNA. We believe that this approach has the novelty of treating the RNA sequence as a graph and RBP binding sites as enriched subgraph, which has broader applications beyond RBP-RNA interactions.

Collapse

Du X, Zhao X, Zhang Y. DeepBtoD: Improved RNA-binding proteins prediction via integrated deep learning. J Bioinform Comput Biol 2022;20:2250006. [PMID: 35451938 DOI: 10.1142/s0219720022500068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Wei J, Chen S, Zong L, Gao X, Li Y. Protein-RNA interaction prediction with deep learning: structure matters. Brief Bioinform 2022;23:bbab540. [PMID: 34929730 PMCID: PMC8790951 DOI: 10.1093/bib/bbab540] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 11/14/2021] [Accepted: 11/22/2021] [Indexed: 12/11/2022] Open

Wang Y, Yang Y, Ma Z, Wong KC, Li X. EDCNN: identification of genome-wide RNA-binding proteins using evolutionary deep convolutional neural network. Bioinformatics 2022;38:678-686. [PMID: 34694393 DOI: 10.1093/bioinformatics/btab739] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Revised: 10/14/2021] [Accepted: 10/20/2021] [Indexed: 02/03/2023] Open

3D Modeling of Non-coding RNA Interactions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022;1385:281-317. [DOI: 10.1007/978-3-031-08356-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Marques-Pereira C, Pires M, Moreira IS. Discovery of Virus-Host interactions using bioinformatic tools. Methods Cell Biol 2022;169:169-198. [PMID: 35623701 DOI: 10.1016/bs.mcb.2022.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Deep multi-scale attention network for RNA-binding proteins prediction. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.09.025] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Li X, Zhang S, Wong KC. Multiobjective Genome-Wide RNA-Binding Event Identification From CLIP-Seq Data. IEEE TRANSACTIONS ON CYBERNETICS 2021;51:5811-5824. [PMID: 31940583 DOI: 10.1109/tcyb.2019.2960515] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Abstract

RNA-binding proteins (RBPs) are the master regulators of mRNA processing, which are vital players for the post-transcriptional control of gene expression. In recent years, crosslinking immunoprecipitation sequencing (CLIP-seq) technologies have enabled us to sequence massive amounts of genome-wide RNA-binding event data. Its increasing availability provides opportunities to identify protein-RNA interactions on a genome-wide scale. Genome-wide RNA-binding event detection methods have been developed to the understanding of the proteins' functions within cellular processes. Unfortunately, those methods often suffer from realistic restrictions, such as high costs, intensive computation, high dimensionality, numerical instability, and data sparsity. We present a computational method [multiobjective forest algorithm (MFA)] to identify protein-RNA interactions from CLIP-seq data by synergizing multiobjective biogeography-based optimization (BBO) with random forest (RF). Since most of the tree-structured classifiers in RF are unnecessarily bulky with extra time costs and memory consumption, multiobjective BBO is designed to prune the unsuitable tree-structured classifiers dynamically. Moreover, to direct the evolution dynamics of the MFA, two objective functions are formulated to balance model generality and complexity for robust performance. To validate our MFA method, we compare its performance across 31 large-scale CLIP-seq datasets. The experimental results demonstrate that MFA can obtain superior performance over the current state-of-the-art methods. Mechanistic insights are also revealed and discussed to explore the multifaceted aspects of MFA through data source importance analysis, matrix rank estimations, seeding component perturbations, and multiobjective optimization methodology comparisons.

Collapse

Zhao S, Hamada M. Multi-resBind: a residual network-based multi-label classifier for in vivo RNA binding prediction and preference visualization. BMC Bioinformatics 2021;22:554. [PMID: 34781902 PMCID: PMC8594109 DOI: 10.1186/s12859-021-04430-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 10/06/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Protein-RNA interactions play key roles in many processes regulating gene expression. To understand the underlying binding preference, ultraviolet cross-linking and immunoprecipitation (CLIP)-based methods have been used to identify the binding sites for hundreds of RNA-binding proteins (RBPs) in vivo. Using these large-scale experimental data to infer RNA binding preference and predict missing binding sites has become a great challenge. Some existing deep-learning models have demonstrated high prediction accuracy for individual RBPs. However, it remains difficult to avoid significant bias due to the experimental protocol. The DeepRiPe method was recently developed to solve this problem via introducing multi-task or multi-label learning into this field. However, this method has not reached an ideal level of prediction power due to the weak neural network architecture.

RESULTS

Compared to the DeepRiPe approach, our Multi-resBind method demonstrated substantial improvements using the same large-scale PAR-CLIP dataset with respect to an increase in the area under the receiver operating characteristic curve and average precision. We conducted extensive experiments to evaluate the impact of various types of input data on the final prediction accuracy. The same approach was used to evaluate the effect of loss functions. Finally, a modified integrated gradient was employed to generate attribution maps. The patterns disentangled from relative contributions according to context offer biological insights into the underlying mechanism of protein-RNA interactions.

CONCLUSIONS

Here, we propose Multi-resBind as a new multi-label deep-learning approach to infer protein-RNA binding preferences and predict novel interactions. The results clearly demonstrate that Multi-resBind is a promising tool to predict unknown binding sites in vivo and gain biology insights into why the neural network makes a given prediction.

Collapse

Griesemer D, Xue JR, Reilly SK, Ulirsch JC, Kukreja K, Davis JR, Kanai M, Yang DK, Butts JC, Guney MH, Luban J, Montgomery SB, Finucane HK, Novina CD, Tewhey R, Sabeti PC. Genome-wide functional screen of 3'UTR variants uncovers causal variants for human disease and evolution. Cell 2021;184:5247-5260.e19. [PMID: 34534445 PMCID: PMC8487971 DOI: 10.1016/j.cell.2021.08.025] [Citation(s) in RCA: 103] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 05/25/2021] [Accepted: 08/19/2021] [Indexed: 12/11/2022]

Affiliation(s)

Dustin Griesemer Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Department of Anesthesiology, Perioperative, and Pain Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
James R Xue Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA.
Steven K Reilly Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA
Jacob C Ulirsch Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
Kalki Kukreja Department of Molecular and Cell Biology, Harvard University, Cambridge, MA 02138, USA
Joe R Davis BigHat Biosciences, San Carlos, CA 94070, USA
Masahiro Kanai Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
David K Yang Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA
John C Butts The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA
Mehmet H Guney Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA
Jeremy Luban Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA; Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA
Stephen B Montgomery Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
Hilary K Finucane Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
Carl D Novina Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
Ryan Tewhey The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA; Tufts University School of Medicine, Boston, MA 02111, USA
Pardis C Sabeti Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA

Collapse

Tripto E, Orenstein Y. A comparative analysis of RNA-binding proteins binding models learned from RNAcompete, RNA Bind-n-Seq and eCLIP data. Brief Bioinform 2021;22:6278600. [PMID: 34017982 DOI: 10.1093/bib/bbab149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 03/16/2021] [Accepted: 03/27/2021] [Indexed: 11/14/2022] Open

Sun L, Xu K, Huang W, Yang YT, Li P, Tang L, Xiong T, Zhang QC. Predicting dynamic cellular protein-RNA interactions by deep learning using in vivo RNA structures. Cell Res 2021;31:495-516. [PMID: 33623109 PMCID: PMC7900654 DOI: 10.1038/s41422-021-00476-y] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 01/19/2021] [Indexed: 01/31/2023] Open

Affiliation(s)

Lei Sun MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Kui Xu MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Wenze Huang MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Yucheng T Yang Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Pan Li MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Lei Tang MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Tuanlin Xiong MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Qiangfeng Cliff Zhang MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China. Tsinghua-Peking Center for Life Sciences, Beijing 100084, China.

Collapse

Koo PK, Majdandzic A, Ploenzke M, Anand P, Paul SB. Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks. PLoS Comput Biol 2021;17:e1008925. [PMID: 33983921 PMCID: PMC8118286 DOI: 10.1371/journal.pcbi.1008925] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 03/30/2021] [Indexed: 12/15/2022] Open

Yang S, Liu X, Ng RT. ProbeRating: a recommender system to infer binding profiles for nucleic acid-binding proteins. Bioinformatics 2021;36:4797-4804. [PMID: 32573679 PMCID: PMC7750938 DOI: 10.1093/bioinformatics/btaa580] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 05/18/2020] [Accepted: 06/18/2020] [Indexed: 12/15/2022] Open

Abstract

MOTIVATION

The interaction between proteins and nucleic acids plays a crucial role in gene regulation and cell function. Determining the binding preferences of nucleic acid-binding proteins (NBPs), namely RNA-binding proteins (RBPs) and transcription factors (TFs), is the key to decipher the protein-nucleic acids interaction code. Today, available NBP binding data from in vivo or in vitro experiments are still limited, which leaves a large portion of NBPs uncovered. Unfortunately, existing computational methods that model the NBP binding preferences are mostly protein specific: they need the experimental data for a specific protein in interest, and thus only focus on experimentally characterized NBPs. The binding preferences of experimentally unexplored NBPs remain largely unknown.

RESULTS

Here, we introduce ProbeRating, a nucleic acid recommender system that utilizes techniques from deep learning and word embeddings of natural language processing. ProbeRating is developed to predict binding profiles for unexplored or poorly studied NBPs by exploiting their homologs NBPs which currently have available binding data. Requiring only sequence information as input, ProbeRating adapts FastText from Facebook AI Research to extract biological features. It then builds a neural network-based recommender system. We evaluate the performance of ProbeRating on two different tasks: one for RBP and one for TF. As a result, ProbeRating outperforms previous methods on both tasks. The results show that ProbeRating can be a useful tool to study the binding mechanism for the many NBPs that lack direct experimental evidence. and implementation.

AVAILABILITY AND IMPLEMENTATION

The source code is freely available at <https://github.com/syang11/ProbeRating>.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure. BMC Genomics 2020;21:866. [PMID: 33334313 PMCID: PMC7745412 DOI: 10.1186/s12864-020-07239-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Abstract

Background

RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences.

Results

In this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets.

Conclusions

Our extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/.

Supplementary Information

The online version contains supplementary material available at (doi:10.1186/s12864-020-07239-w).

Collapse

Yang Y, Hou Z, Ma Z, Li X, Wong KC. iCircRBP-DHN: identification of circRNA-RBP interaction sites using deep hierarchical network. Brief Bioinform 2020;22:5943796. [PMID: 33126261 DOI: 10.1093/bib/bbaa274] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 09/07/2020] [Accepted: 09/21/2020] [Indexed: 12/19/2022] Open

Grønning AGB, Doktor TK, Larsen SJ, Petersen USS, Holm LL, Bruun GH, Hansen MB, Hartung AM, Baumbach J, Andresen BS. DeepCLIP: predicting the effect of mutations on protein-RNA binding with deep learning. Nucleic Acids Res 2020;48:7099-7118. [PMID: 32558887 PMCID: PMC7367176 DOI: 10.1093/nar/gkaa530] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 05/11/2020] [Accepted: 06/10/2020] [Indexed: 02/07/2023] Open

Affiliation(s)

Alexander Gulliver Bjørnholt Grønning Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark.,Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense M, Denmark
Thomas Koed Doktor Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
Simon Jonas Larsen Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense M, Denmark
Ulrika Simone Spangsberg Petersen Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
Lise Lolle Holm Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
Gitte Hoffmann Bruun Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
Michael Birkerod Hansen Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
Anne-Mette Hartung Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
Jan Baumbach Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense M, Denmark.,Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany
Brage Storstein Andresen Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark.,Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark

Collapse

Barshai M, Tripto E, Orenstein Y. Identifying Regulatory Elements via Deep Learning. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-022020-021940] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

RNA-centric approaches to study RNA-protein interactions in vitro and in silico. Methods 2020;178:11-18. [DOI: 10.1016/j.ymeth.2019.09.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 09/10/2019] [Accepted: 09/10/2019] [Indexed: 01/17/2023] Open

Sagar A, Xue B. Recent Advances in Machine Learning Based Prediction of RNA-protein Interactions. Protein Pept Lett 2019;26:601-619. [PMID: 31215361 DOI: 10.2174/0929866526666190619103853] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 04/04/2019] [Accepted: 06/01/2019] [Indexed: 12/18/2022]

Su Y, Luo Y, Zhao X, Liu Y, Peng J. Integrating thermodynamic and sequence contexts improves protein-RNA binding prediction. PLoS Comput Biol 2019;15:e1007283. [PMID: 31483777 PMCID: PMC6752863 DOI: 10.1371/journal.pcbi.1007283] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 09/19/2019] [Accepted: 07/24/2019] [Indexed: 11/23/2022] Open

Abstract

Predicting RNA-binding protein (RBP) specificity is important for understanding gene expression regulation and RNA-mediated enzymatic processes. It is widely believed that RBP binding specificity is determined by both the sequence and structural contexts of RNAs. Existing approaches, including traditional machine learning algorithms and more recently, deep learning models, have been extensively applied to integrate RNA sequence and its predicted or experimental RNA structural probabilities for improving the accuracy of RBP binding prediction. Such models were trained mostly on the large-scale in vitro datasets, such as the RNAcompete dataset. However, in RNAcompete, most synthetic RNAs are unstructured, which makes machine learning methods not effectively extract RBP-binding structural preferences. Furthermore, RNA structure may be variable or multi-modal according to both theoretical and experimental evidence. In this work, we propose ThermoNet, a thermodynamic prediction model by integrating a new sequence-embedding convolutional neural network model over a thermodynamic ensemble of RNA secondary structures. First, the sequence-embedding convolutional neural network generalizes the existing k-mer based methods by jointly learning convolutional filters and k-mer embeddings to represent RNA sequence contexts. Second, the thermodynamic average of deep-learning predictions is able to explore structural variability and improves the prediction, especially for the structured RNAs. Extensive experiments demonstrate that our method significantly outperforms existing approaches, including RCK, DeepBind and several other recent state-of-the-art methods for predictions on both in vitro and in vivo data. The implementation of ThermoNet is available at https://github.com/suyufeng/ThermoNet.

RNA-binding proteins (RBPs) play a key role in modulating various cellular processes, including transcription, alternative splicing, and translational regulation. Identifying protein-RNA interactions and the binding preferences of RBPs are critical to unraveling the mechanism of post-transcriptional gene regulation. In the current study, we present a computational approach that integrates both structure and sequence contexts for protein-RNA binding prediction. We propose to incorporate the structure information using a thermodynamic ensemble of secondary structures, which effectively identifies RBP-binding structural preferences, especially for structured RNAs. Our model is further empowered by a deep neural network that combines the sequence and structure information to achieve improved protein-RNA binding prediction. Extensive experiments on both in vitro and in vivo datasets demonstrate the superior performance of our method compared to several state-of-the-art approaches. This study suggests the great potential of our method as a practical tool for identifying novel protein-RNA interactions and binding sites of RBPs.

Collapse

Polishchuk M, Paz I, Yakhini Z, Mandel-Gutfreund Y. SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data. Nucleic Acids Res 2019;46:W221-W228. [PMID: 29800452 PMCID: PMC6030986 DOI: 10.1093/nar/gky453] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 05/13/2018] [Indexed: 01/24/2023] Open

Kinney JB, McCandlish DM. Massively Parallel Assays and Quantitative Sequence-Function Relationships. Annu Rev Genomics Hum Genet 2019;20:99-127. [PMID: 31091417 DOI: 10.1146/annurev-genom-083118-014845] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Pan X, Yang Y, Xia C, Mirza AH, Shen H. Recent methodology progress of deep learning for RNA–protein interaction prediction. WILEY INTERDISCIPLINARY REVIEWS-RNA 2019;10:e1544. [DOI: 10.1002/wrna.1544] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 04/07/2019] [Accepted: 04/11/2019] [Indexed: 12/17/2022]

Bioinformatics Approaches to Gain Insights into cis-Regulatory Motifs Involved in mRNA Localization. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019;1203:165-194. [PMID: 31811635 DOI: 10.1007/978-3-030-31434-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Ben-Bassat I, Chor B, Orenstein Y. A deep neural network approach for learning intrinsic protein-RNA binding preferences. Bioinformatics 2018;34:i638-i646. [DOI: 10.1093/bioinformatics/bty600] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Kulandaisamy A, Srivastava A, Kumar P, Nagarajan R, Priya SB, Gromiha MM. Identification and Analysis of Key Residues in Protein-RNA Complexes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:1436-1444. [PMID: 29993582 DOI: 10.1109/tcbb.2018.2834387] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Zagrovic B, Bartonek L, Polyansky AA. RNA-protein interactions in an unstructured context. FEBS Lett 2018;592:2901-2916. [PMID: 29851074 PMCID: PMC6175095 DOI: 10.1002/1873-3468.13116] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 05/12/2018] [Accepted: 05/13/2018] [Indexed: 02/02/2023]

Sasse A, Laverty KU, Hughes TR, Morris QD. Motif models for RNA-binding proteins. Curr Opin Struct Biol 2018;53:115-123. [PMID: 30172081 DOI: 10.1016/j.sbi.2018.08.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 08/07/2018] [Indexed: 01/24/2023]

Pan X, Shen HB. Learning distributed representations of RNA sequences and its application for predicting RNA-protein binding sites with a convolutional neural network. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.04.036] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Yang S, Wang J, Ng RT. Inferring RNA sequence preferences for poorly studied RNA-binding proteins based on co-evolution. BMC Bioinformatics 2018. [PMID: 29529991 PMCID: PMC5848454 DOI: 10.1186/s12859-018-2091-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Abstract

Background

Characterizing the binding preference of RNA-binding proteins (RBP) is essential for us to understand the interaction between an RBP and its RNA targets, and to decipher the mechanism of post-transcriptional regulation. Experimental methods have been used to generate protein-RNA binding data for a number of RBPs in vivo and in vitro. Utilizing the binding data, a couple of computational methods have been developed to detect the RNA sequence or structure preferences of the RBPs. However, the majority of RBPs have not yet been experimentally characterized and lack RNA binding data. For these poorly studied RBPs, the identification of their binding preferences cannot be performed by most existing computational methods because the experimental binding data are prerequisite to these methods.

Results

Here we propose a new method based on co-evolution to predict the sequence preferences for the poorly studied RBPs, waiving the requirement of their binding data. First, we demonstrate the co-evolutionary relationship between RBPs and their RNA partners. We then present a K-nearest neighbors (KNN) based algorithm to infer the sequence preference of an RBP using only the preference information from its homologous RBPs. By benchmarking against several in vitro and in vivo datasets, our proposed method outperforms the existing alternative which uses the closest neighbor’s preference on all the datasets. Moreover, it shows comparable performance with two state-of-the-art methods that require the presence of the experimental binding data. Finally, we demonstrate the usage of this method to infer sequence preferences for novel proteins which have no binding preference information available.

Conclusion

For a poorly studied RBP, the current methods used to determine its binding preference need experimental data, which is expensive and time consuming. Therefore, determining RBP’s preference is not practical in many situations. This study provides an economic solution to infer the sequence preference of such protein based on the co-evolution. The source codes and related datasets are available at https://github.com/syang11/KNN.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2091-8) contains supplementary material, which is available to authorized users.

Collapse

Orenstein Y, Ohler U, Berger B. Finding RNA structure in the unstructured RBPome. BMC Genomics 2018;19:154. [PMID: 29463232 PMCID: PMC5819699 DOI: 10.1186/s12864-018-4540-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 02/13/2018] [Indexed: 11/22/2022] Open

Hamada M. In silico approaches to RNA aptamer design. Biochimie 2017;145:8-14. [PMID: 29032056 DOI: 10.1016/j.biochi.2017.10.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 10/09/2017] [Indexed: 10/18/2022]

Cook KB, Vembu S, Ha KCH, Zheng H, Laverty KU, Hughes TR, Ray D, Morris QD. RNAcompete-S: Combined RNA sequence/structure preferences for RNA binding proteins derived from a single-step in vitro selection. Methods 2017;126:18-28. [PMID: 28651966 DOI: 10.1016/j.ymeth.2017.06.024] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Revised: 06/16/2017] [Accepted: 06/21/2017] [Indexed: 12/15/2022] Open

Pei S, Slinger BL, Meyer MM. Recognizing RNA structural motifs in HT-SELEX data for ribosomal protein S15. BMC Bioinformatics 2017;18:298. [PMID: 28587636 PMCID: PMC5461778 DOI: 10.1186/s12859-017-1704-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Accepted: 05/22/2017] [Indexed: 01/30/2023] Open

Abstract

BACKGROUND

Proteins recognize many different aspects of RNA ranging from single stranded regions to discrete secondary or tertiary structures. High-throughput sequencing (HTS) of in vitro selected populations offers a large scale method to study RNA-proteins interactions. However, most existing analysis methods require that the binding motifs are enriched in the population relative to earlier rounds, and that motifs are found in a loop or single stranded region of the potential RNA secondary structure. Such methods do not generalize to all RNA-protein interaction as some RNA binding proteins specifically recognize more complex structures such as double stranded RNA.

RESULTS

In this study, we use HT-SELEX derived populations to study the landscape of RNAs that interact with Geobacillus kaustophilus ribosomal protein S15. Our data show high sequence and structure diversity and proved intractable to existing methods. Conventional programs identified some sequence motifs, but these are found in less than 5-10% of the total sequence pool. Therefore, we developed a novel framework to analyze HT-SELEX data. Our process accounts for both sequence and structure components by abstracting the overall secondary structure into smaller substructures composed of a single base-pair stack, which allows us to leverage existing approaches already used in k-mer analysis to identify enriched motifs. By focusing on secondary structure motifs composed of specific two base-pair stacks, we identified significantly enriched or depleted structure motifs relative to earlier rounds.

CONCLUSIONS

Discrete substructures are likely to be important to RNA-protein interactions, but they are difficult to elucidate. Substructures can help make highly diverse sequence data more tractable. The structure motifs provide limited accuracy in predicting enrichment suggesting that G. kaustophilus S15 can either recognize many different secondary structure motifs or some aspects of the interaction are not captured by the analysis. This highlights the importance of considering secondary and tertiary structure elements and their role in RNA-protein interactions.

Collapse