1
|
Wang X, Li J, Zhang C, Guan X, Li X, Jia W, Chen A. Old players and new insights: unraveling the role of RNA-binding proteins in brain tumors. Theranostics 2025; 15:5238-5257. [PMID: 40303323 PMCID: PMC12036871 DOI: 10.7150/thno.113312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2025] [Accepted: 03/27/2025] [Indexed: 05/02/2025] Open
Abstract
The human genome harbors >1,600 evolutionarily conserved RNA-binding proteins (RBPs), with extensive multi-omics investigations documenting their pervasive dysregulation in malignancies ranging from glioblastoma to melanoma. These RBPs are integral to the complex regulatory networks governing hallmark cancer processes. Recent studies have investigated the multifaceted contributions of RBPs to tumorigenesis, tumor metabolism, the tumor-immune microenvironment, and resistance to therapy. This complexity is further compounded by the intricate regulation of RNA function at various levels by RBPs, as well as the post-translational modifications of RBPs, which improve their functional capacity. Moreover, numerous RBP-based therapeutics have emerged, each underpinned by distinct molecular mechanisms that extend from genomic analysis to the interference of RBPs' function. This review aims to provide a comprehensive overview of the recent progress in the meticulous roles of RBPs in brain tumors and to explore potential therapeutic interventions targeting these RBPs, complemented by a discussion of innovative techniques emerging in this research field. Advances in deciphering RNA-RBP interactomes and refining targeted therapeutic strategies are revealing the transformative potential of RBP-centric approaches in brain tumor treatment, establishing them as pivotal agents for overcoming current clinical challenges.
Collapse
Affiliation(s)
- Xu Wang
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Fengtai, Beijing, China
- Department of Neurosurgery, Qilu Hospital, Cheeloo College of Medicine and Institute of Brain and Brain-Inspired Science, Shandong University, Jinan, 250012, China
- Jinan Microecological Biomedicine Shandong Laboratory, Jinan, 250117, China and Shandong Key Laboratory of Brain Health and Function Remodeling, Jinan 250012, China
| | - Jiang Li
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Fengtai, Beijing, China
| | - Chengkai Zhang
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Fengtai, Beijing, China
| | - Xiudong Guan
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Fengtai, Beijing, China
| | - Xingang Li
- Department of Neurosurgery, Qilu Hospital, Cheeloo College of Medicine and Institute of Brain and Brain-Inspired Science, Shandong University, Jinan, 250012, China
- Jinan Microecological Biomedicine Shandong Laboratory, Jinan, 250117, China and Shandong Key Laboratory of Brain Health and Function Remodeling, Jinan 250012, China
| | - Wang Jia
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Fengtai, Beijing, China
| | - Anjing Chen
- Department of Neurosurgery, Qilu Hospital, Cheeloo College of Medicine and Institute of Brain and Brain-Inspired Science, Shandong University, Jinan, 250012, China
- Jinan Microecological Biomedicine Shandong Laboratory, Jinan, 250117, China and Shandong Key Laboratory of Brain Health and Function Remodeling, Jinan 250012, China
| |
Collapse
|
2
|
Bertacchi M, Theiß S, Ahmed A, Eibl M, Loubat A, Maharaux G, Phromkrasae W, Chakrabandhu K, Camgöz A, Antonaci M, Schaaf CP, Studer M, Laugsch M. Unravelling the conundrum of nucleolar NR2F1 localization using antibody-based approaches in vitro and in vivo. Commun Biol 2025; 8:594. [PMID: 40204944 PMCID: PMC11982218 DOI: 10.1038/s42003-025-07985-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Accepted: 03/21/2025] [Indexed: 04/11/2025] Open
Abstract
As a transcription factor, NR2F1 regulates spatiotemporal gene expression in the nucleus particularly during development. Aberrant NR2F1 causes the rare neurodevelopmental disorder Bosch-Boonstra-Schaaf Optic Atrophy Syndrome. In addition, altered NR2F1 expression is frequently observed in various cancers and is considered a prognostic marker or potential therapeutic target. NR2F1 has been found in both the nucleus and nucleoli, suggesting a non-canonical and direct role in the latter compartment. Hence, we studied this phenomenon employing various in vitro and in vivo models using different antibody-dependent approaches. Examination of seven commonly used anti-NR2F1 antibodies in different human cancer and stem cells as well as in wild type and null mice revealed that NR2F1 nucleolar localization is artificial and has no functional role. Our subsequent comparative analysis demonstrated which anti-NR2F1 antibody best fits which approach. The data allow for correct data interpretation and underline the need to optimize any antibody-mediated technique.
Collapse
Affiliation(s)
- Michele Bertacchi
- Université Côte d'Azur, CNRS, Inserm, Institute of Biology Valrose (iBV), 06108, Nice, France.
| | - Susanne Theiß
- Institute of Human Genetics, Heidelberg University, Heidelberg, Germany
| | - Ayat Ahmed
- Institute of Human Genetics, Heidelberg University, Heidelberg, Germany
| | - Michael Eibl
- Institute of Human Genetics, Heidelberg University, Heidelberg, Germany
| | - Agnès Loubat
- Université Côte d'Azur, CNRS, Inserm, Institute of Biology Valrose (iBV), 06108, Nice, France
| | - Gwendoline Maharaux
- Université Côte d'Azur, CNRS, Inserm, Institute of Biology Valrose (iBV), 06108, Nice, France
| | - Wanchana Phromkrasae
- Université Côte d'Azur, CNRS, Inserm, Institute of Biology Valrose (iBV), 06108, Nice, France
| | - Krittalak Chakrabandhu
- Université Côte d'Azur, CNRS, Inserm, Institute of Biology Valrose (iBV), 06108, Nice, France
| | - Aylin Camgöz
- Hopp Children's Cancer Center (KITZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Marco Antonaci
- Institute of Human Genetics, Heidelberg University, Heidelberg, Germany
| | | | - Michèle Studer
- Université Côte d'Azur, CNRS, Inserm, Institute of Biology Valrose (iBV), 06108, Nice, France
| | - Magdalena Laugsch
- Institute of Human Genetics, Heidelberg University, Heidelberg, Germany.
| |
Collapse
|
3
|
Pan X, Fang Y, Liu X, Guo X, Shen HB. RBPsuite 2.0: an updated RNA-protein binding site prediction suite with high coverage on species and proteins based on deep learning. BMC Biol 2025; 23:74. [PMID: 40069726 PMCID: PMC11899677 DOI: 10.1186/s12915-025-02182-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 03/03/2025] [Indexed: 03/14/2025] Open
Abstract
BACKGROUND RNA-binding proteins (RBPs) play crucial roles in many biological processes, and computationally identifying RNA-RBP interactions provides insights into the biological mechanism of diseases associated with RBPs. RESULTS To make the RBP-specific deep learning-based RBP binding sites prediction methods easily accessible, we developed an updated easy-to-use webserver, RBPsuite 2.0, with an updated web interface for predicting RBP binding sites from linear and circular RNA sequences. RBPsuite 2.0 has a higher coverage on the number of supported RBPs and species compared to the original RBPsuite, supporting an increased number of RBPs from 154 to 353 and expanding the supported species from one to seven. Additionally, RBPsuite 2.0 replaces the CRIP built into RBPsuite 1.0 with iDeepC, a more accurate RBP binding site predictor for circular RNAs. Furthermore, RBPsuite 2.0 estimates the contribution score of individual nucleotides on the input sequences as potential binding motifs and links to the UCSC browser track for better visualization of the prediction results. CONCLUSIONS RBPsuite 2.0 is an updated, more comprehensive webserver for predicting RBP binding sites in both linear and circular RNA sequences. It supports more RBPs and species and provides more accurate predictions for circular RNAs. The tool is freely available at http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/ .
Collapse
Affiliation(s)
- Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| | - Yi Fang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Xiaojian Liu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Xiaoyu Guo
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| |
Collapse
|
4
|
Wang Y, Zhu H, Wang Y, Yang Y, Huang Y, Zhang J, Wong KC, Li X. EnrichRBP: an automated and interpretable computational platform for predicting and analysing RNA-binding protein events. Bioinformatics 2024; 41:btaf018. [PMID: 39804669 PMCID: PMC11783304 DOI: 10.1093/bioinformatics/btaf018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 12/18/2024] [Accepted: 01/10/2025] [Indexed: 02/01/2025] Open
Abstract
MOTIVATION Predicting RNA-binding proteins (RBPs) is central to understanding post-transcriptional regulatory mechanisms. Here, we introduce EnrichRBP, an automated and interpretable computational platform specifically designed for the comprehensive analysis of RBP interactions with RNA. RESULTS EnrichRBP is a web service that enables researchers to develop original deep learning and machine learning architectures to explore the complex dynamics of RBPs. The platform supports 70 deep learning algorithms, covering feature representation, selection, model training, comparison, optimization, and evaluation, all integrated within an automated pipeline. EnrichRBP is adept at providing comprehensive visualizations, enhancing model interpretability, and facilitating the discovery of functionally significant sequence regions crucial for RBP interactions. In addition, EnrichRBP supports base-level functional annotation tasks, offering explanations and graphical visualizations that confirm the reliability of the predicted RNA-binding sites. Leveraging high-performance computing, EnrichRBP provides ultra-fast predictions ranging from seconds to hours, applicable to both pre-trained and custom model scenarios, thus proving its utility in real-world applications. Case studies highlight that EnrichRBP provides robust and interpretable predictions, demonstrating the power of deep learning in the functional analysis of RBP interactions. Finally, EnrichRBP aims to enhance the reproducibility of computational method analyses for RBP sequences, as well as reduce the programming and hardware requirements for biologists, thereby offering meaningful functional insights. AVAILABILITY AND IMPLEMENTATION EnrichRBP is available at https://airbp.aibio-lab.com/. The source code is available at https://github.com/wangyb97/EnrichRBP, and detailed online documentation can be found at https://enrichrbp.readthedocs.io/en/latest/.
Collapse
Affiliation(s)
- Yubo Wang
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Haoran Zhu
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Yansong Wang
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Yuning Yang
- Information Science and Technology, Northeast Normal University, Changchun 130024, China
| | - Yujian Huang
- College of Computer Science and Cyber Security, Chengdu University of Technology, Chengdu 610059, China
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
| | - Ka-chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR 999077, China
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| |
Collapse
|
5
|
Azizian S, Cui J. DeepMiRBP: a hybrid model for predicting microRNA-protein interactions based on transfer learning and cosine similarity. BMC Bioinformatics 2024; 25:381. [PMID: 39695955 DOI: 10.1186/s12859-024-05985-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 11/12/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Interactions between microRNAs and RNA-binding proteins are crucial for microRNA-mediated gene regulation and sorting. Despite their significance, the molecular mechanisms governing these interactions remain underexplored, apart from sequence motifs identified on microRNAs. To date, only a limited number of microRNA-binding proteins have been confirmed, typically through labor-intensive experimental procedures. Advanced bioinformatics tools are urgently needed to facilitate this research. METHODS We present DeepMiRBP, a novel hybrid deep learning model specifically designed to predict microRNA-binding proteins by modeling molecular interactions. This innovation approach is the first to target the direct interactions between small RNAs and proteins. DeepMiRBP consists of two main components. The first component employs bidirectional long short-term memory (Bi-LSTM) neural networks to capture sequential dependencies and context within RNA sequences, attention mechanisms to enhance the model's focus on the most relevant features and transfer learning to apply knowledge gained from a large dataset of RNA-protein binding sites to the specific task of predicting microRNA-protein interactions. Cosine similarity is applied to assess RNA similarities. The second component utilizes Convolutional Neural Networks (CNNs) to process the spatial data inherent in protein structures based on Position-Specific Scoring Matrices (PSSM) and contact maps to generate detailed and accurate representations of potential microRNA-binding sites and assess protein similarities. RESULTS DeepMiRBP achieved a prediction accuracy of 87.4% during training and 85.4% using testing, with an F score of 0.860. Additionally, we validated our method using three case studies, focusing on microRNAs such as miR-451, -19b, -23a, -21, -223, and -let-7d. DeepMiRBP successfully predicted known miRNA interactions with recently discovered RNA-binding proteins, including AGO, YBX1, and FXR2, identified in various exosomes. CONCLUSIONS Our proposed DeepMiRBP strategy represents the first of its kind designed for microRNA-protein interaction prediction. Its promising performance underscores the model's potential to uncover novel interactions critical for small RNA sorting and packaging, as well as to infer new RNA transporter proteins. The methodologies and insights from DeepMiRBP offer a scalable template for future small RNA research, from mechanistic discovery to modeling disease-related cell-to-cell communication, emphasizing its adaptability and potential for developing novel small RNA-centric therapeutic interventions and personalized medicine.
Collapse
Affiliation(s)
- Sasan Azizian
- School of Computing, University of Nebraska-Lincoln, 1400 R St, Lincoln, NE, 68588-0115, USA
| | - Juan Cui
- School of Computing, University of Nebraska-Lincoln, 1400 R St, Lincoln, NE, 68588-0115, USA.
| |
Collapse
|
6
|
Krautwurst S, Lamkiewicz K. RNA-protein interaction prediction without high-throughput data: An overview and benchmark of in silico tools. Comput Struct Biotechnol J 2024; 23:4036-4046. [PMID: 39610906 PMCID: PMC11603007 DOI: 10.1016/j.csbj.2024.11.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 11/05/2024] [Accepted: 11/05/2024] [Indexed: 11/30/2024] Open
Abstract
RNA-protein interactions (RPIs) are crucial for accurately operating various processes in and between organisms across kingdoms of life. Mutual detection of RPI partner molecules depends on distinct sequential, structural, or thermodynamic features, which can be determined via experimental and bioinformatic methods. Still, the underlying molecular mechanisms of many RPIs are poorly understood. It is further hypothesized that many RPIs are not even described yet. Computational RPI prediction is continuously challenged by the lack of data and detailed research of very specific examples. With the discovery of novel RPI complexes in all kingdoms of life, adaptations of existing RPI prediction methods are necessary. Continuously improving computational RPI prediction is key in advancing the understanding of RPIs in detail and supplementing experimental RPI determination. The growing amount of data covering more species and detailed mechanisms support the accuracy of prediction tools, which in turn support specific experimental research on RPIs. Here, we give an overview of RPI prediction tools that do not use high-throughput data as the user's input. We review the tools according to their input, usability, and output. We then apply the tools to known RPI examples across different kingdoms of life. Our comparison shows that the investigated prediction tools do not favor a certain species and equip the user with results varying in degree of information, from an overall RPI score to detailed interacting residues. Furthermore, we provide a guide tree to assist users which RPI prediction tool is appropriate for their available input data and desired output.
Collapse
Affiliation(s)
- Sarah Krautwurst
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| | - Kevin Lamkiewicz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstr. 4, 04103 Leipzig, Germany
| |
Collapse
|
7
|
Zhou Y, Cui H, Liu D, Wang W. MSTCRB: Predicting circRNA-RBP interaction by extracting multi-scale features based on transformer and attention mechanism. Int J Biol Macromol 2024; 278:134805. [PMID: 39153682 DOI: 10.1016/j.ijbiomac.2024.134805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/14/2024] [Accepted: 08/14/2024] [Indexed: 08/19/2024]
Abstract
CircRNAs play vital roles in biological system mainly through binding RNA-binding protein (RBP), which is essential for regulating physiological processes in vivo and for identifying causal disease variants. Therefore, predicting interactions between circRNA and RBP is a critical step for the discovery of new therapeutic agents. Application of various deep-learning models in bioinformatics has significantly improved prediction and classification performance. However, most of existing prediction models are only applicable to specific type of RNA or RNA with simple characteristics. In this study, we proposed an attractive deep learning model, MSTCRB, based on transformer and attention mechanism for extracting multi-scale features to predict circRNA-RBP interactions. Therein, K-mer and KNF encoding are employed to capture the global sequence features of circRNA, NCP and DPCP encoding are utilized to extract local sequence features, and the CDPfold method is applied to extract structural features. In order to improve prediction performance, optimized transformer framework and attention mechanism were used to integrate these multi-scale features. We compared our model's performance with other five state-of-the-art methods on 37 circRNA datasets and 31 linear RNA datasets. The results show that the average AUC value of MSTCRB reaches 98.45 %, which is better than other comparative methods. All of above datasets are deposited in https://github.com/chy001228/MSTCRB_database.git and source code are available from https://github.com/chy001228/MSTCRB.git.
Collapse
Affiliation(s)
- Yun Zhou
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China; Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China.
| | - Haoyu Cui
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Dong Liu
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China; Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China.
| | - Wei Wang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China; Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China.
| |
Collapse
|
8
|
He C, Duan L, Zheng H, Wang X, Guan L, Xu J. A Representation Learning Approach for Predicting circRNA Back-Splicing Event via Sequence-Interaction-Aware Dual Encoder. IEEE Trans Nanobioscience 2024; 23:603-611. [PMID: 39226209 DOI: 10.1109/tnb.2024.3454079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Circular RNAs (circRNAs) play a crucial role in gene regulation and association with diseases because of their unique closed continuous loop structure, which is more stable and conserved than ordinary linear RNAs. As fundamental work to clarify their functions, a large number of computational approaches for identifying circRNA formation have been proposed. However, these methods fail to fully utilize the important characteristics of back-splicing events, i.e., the positional information of the splice sites and the interaction features of its flanking sequences, for predicting circRNAs. To this end, we hereby propose a novel approach called SIDE for predicting circRNA back-splicing events using only raw RNA sequences. Technically, SIDE employs a dual encoder to capture global and interactive features of the RNA sequence, and then a decoder designed by the contrastive learning to fuse out discriminative features improving the prediction of circRNAs formation. Empirical results on three real-world datasets show the effectiveness of SIDE. Further analysis also reveals that the effectiveness of SIDE.
Collapse
|
9
|
Cao X, Zhang Y, Ding Y, Wan Y. Identification of RNA structures and their roles in RNA functions. Nat Rev Mol Cell Biol 2024; 25:784-801. [PMID: 38926530 DOI: 10.1038/s41580-024-00748-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2024] [Indexed: 06/28/2024]
Abstract
The development of high-throughput RNA structure profiling methods in the past decade has greatly facilitated our ability to map and characterize different aspects of RNA structures transcriptome-wide in cell populations, single cells and single molecules. The resulting high-resolution data have provided insights into the static and dynamic nature of RNA structures, revealing their complexity as they perform their respective functions in the cell. In this Review, we discuss recent technical advances in the determination of RNA structures, and the roles of RNA structures in RNA biogenesis and functions, including in transcription, processing, translation, degradation, localization and RNA structure-dependent condensates. We also discuss the current understanding of how RNA structures could guide drug design for treating genetic diseases and battling pathogenic viruses, and highlight existing challenges and future directions in RNA structure research.
Collapse
Affiliation(s)
- Xinang Cao
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, Singapore, Singapore
| | - Yueying Zhang
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, UK
| | - Yiliang Ding
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, UK.
| | - Yue Wan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
10
|
Miyake H, Kawaguchi RK, Kiryu H. RNAelem: an algorithm for discovering sequence-structure motifs in RNA bound by RNA-binding proteins. BIOINFORMATICS ADVANCES 2024; 4:vbae144. [PMID: 39399375 PMCID: PMC11471262 DOI: 10.1093/bioadv/vbae144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 09/08/2024] [Accepted: 09/26/2024] [Indexed: 10/15/2024]
Abstract
Motivation RNA-binding proteins (RBPs) play a crucial role in the post-transcriptional regulation of RNA. Given their importance, analyzing the specific RNA patterns recognized by RBPs has become a significant research focus in bioinformatics. Deep Neural Networks have enhanced the accuracy of prediction for RBP-binding sites, yet understanding the structural basis of RBP-binding specificity from these models is challenging due to their limited interpretability. To address this, we developed RNAelem, which combines profile context-free grammar and the Turner energy model for RNA secondary structure to predict sequence-structure motifs in RBP-binding regions. Results RNAelem exhibited superior detection accuracy compared to existing tools for RNA sequences with structural motifs. Upon applying RNAelem to the eCLIP database, we were not only able to reproduce many known primary sequence motifs in the absence of secondary structures, but also discovered many secondary structural motifs that contained sequence-nonspecific insertion regions. Furthermore, the high interpretability of RNAelem yielded insightful findings such as long-range base-pairing interactions in the binding region of the U2AF protein. Availability and implementation The code is available at https://github.com/iyak/RNAelem.
Collapse
Affiliation(s)
- Hiroshi Miyake
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8561, Japan
| | - Risa Karakida Kawaguchi
- Department of Life Science Frontiers, Center for iPS Cell Research and Application (CiRA), Kyoto University, Sakyo-ku 606-8507, Japan
| | - Hisanori Kiryu
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8561, Japan
| |
Collapse
|
11
|
Wang Z, Liu Z, Zhang W, Li Y, Feng Y, Lv S, Diao H, Luo Z, Yan P, He M, Li X. AptaDiff: de novo design and optimization of aptamers based on diffusion models. Brief Bioinform 2024; 25:bbae517. [PMID: 39431516 PMCID: PMC11491854 DOI: 10.1093/bib/bbae517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 07/05/2024] [Accepted: 10/05/2024] [Indexed: 10/22/2024] Open
Abstract
Aptamers are single-stranded nucleic acid ligands, featuring high affinity and specificity to target molecules. Traditionally they are identified from large DNA/RNA libraries using $in vitro$ methods, like Systematic Evolution of Ligands by Exponential Enrichment (SELEX). However, these libraries capture only a small fraction of theoretical sequence space, and various aptamer candidates are constrained by actual sequencing capabilities from the experiment. Addressing this, we proposed AptaDiff, the first in silico aptamer design and optimization method based on the diffusion model. Our Aptadiff can generate aptamers beyond the constraints of high-throughput sequencing data, leveraging motif-dependent latent embeddings from variational autoencoder, and can optimize aptamers by affinity-guided aptamer generation according to Bayesian optimization. Comparative evaluations revealed AptaDiff's superiority over existing aptamer generation methods in terms of quality and fidelity across four high-throughput screening data targeting distinct proteins. Moreover, surface plasmon resonance experiments were conducted to validate the binding affinity of aptamers generated through Bayesian optimization for two target proteins. The results unveiled a significant boost of $87.9\%$ and $60.2\%$ in RU values, along with a 3.6-fold and 2.4-fold decrease in KD values for the respective target proteins. Notably, the optimized aptamers demonstrated superior binding affinity compared to top experimental candidates selected through SELEX, underscoring the promising outcomes of our AptaDiff in accelerating the discovery of superior aptamers.
Collapse
Affiliation(s)
- Zhen Wang
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082 Hunan, China
| | - Ziqi Liu
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- School of Molecular Medicine, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024 Zhejiang, China
| | - Wei Zhang
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
| | - Yanjun Li
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, United States
| | - Yizhen Feng
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310014 Zhejiang, China
| | - Shaokang Lv
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- Department of Chemical Biology, Zhejiang University of Technology, Huzhou, 313200 Zhejiang, China
| | - Han Diao
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- Department of Chemical Biology, Zhejiang University of Technology, Huzhou, 313200 Zhejiang, China
| | - Zhaofeng Luo
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
| | - Pengju Yan
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- ElasticMind Inc, Hangzhou, 310018 Zhejiang, China
| | - Min He
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082 Hunan, China
| | - Xiaolin Li
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou, 310018 Zhejiang, China
- ElasticMind Inc, Hangzhou, 310018 Zhejiang, China
| |
Collapse
|
12
|
Zuo Y, Chen H, Yang L, Chen R, Zhang X, Deng Z. Research progress on prediction of RNA-protein binding sites in the past five years. Anal Biochem 2024; 691:115535. [PMID: 38643894 DOI: 10.1016/j.ab.2024.115535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 04/23/2024]
Abstract
Accurately predicting RNA-protein binding sites is essential to gain a deeper comprehension of the protein-RNA interactions and their regulatory mechanisms, which are fundamental in gene expression and regulation. However, conventional biological approaches to detect these sites are often costly and time-consuming. In contrast, computational methods for predicting RNA protein binding sites are both cost-effective and expeditious. This review synthesizes already existing computational methods, summarizing commonly used databases for predicting RNA protein binding sites. In addition, applications and innovations of computational methods using traditional machine learning and deep learning for RNA protein binding site prediction during 2018-2023 are presented. These methods cover a wide range of aspects such as effective database utilization, feature selection and encoding, innovative classification algorithms, and evaluation strategies. Exploring the limitations of existing computational methods, this paper delves into the potential directions for future development. DeepRKE, RDense, and DeepDW all employ convolutional neural networks and long and short-term memory networks to construct prediction models, yet their algorithm design and feature encoding differ, resulting in diverse prediction performances.
Collapse
Affiliation(s)
- Yun Zuo
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Huixian Chen
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Lele Yang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Ruoyan Chen
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Xiaoyao Zhang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China.
| |
Collapse
|
13
|
Li F, Ma C, Lei S, Pan Y, Lin L, Pan C, Li Q, Geng F, Min D, Tang X. Gingipains may be one of the key virulence factors of Porphyromonas gingivalis to impair cognition and enhance blood-brain barrier permeability: An animal study. J Clin Periodontol 2024; 51:818-839. [PMID: 38414291 DOI: 10.1111/jcpe.13966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 01/24/2024] [Accepted: 02/08/2024] [Indexed: 02/29/2024]
Abstract
AIM Blood-brain barrier (BBB) disorder is one of the early findings in cognitive impairments. We have recently found that Porphyromonas gingivalis bacteraemia can cause cognitive impairment and increased BBB permeability. This study aimed to find out the possible key virulence factors of P. gingivalis contributing to the pathological process. MATERIALS AND METHODS C57/BL6 mice were infected with P. gingivalis or gingipains or P. gingivalis lipopolysaccharide (P. gingivalis LPS group) by tail vein injection for 8 weeks. The cognitive behaviour changes in mice, the histopathological changes in the hippocampus and cerebral cortex, the alternations of BBB permeability, and the changes in Mfsd2a and Cav-1 levels were measured. The mechanisms of Ddx3x-induced regulation on Mfsd2a by arginine-specific gingipain A (RgpA) in BMECs were explored. RESULTS P. gingivalis and gingipains significantly promoted mice cognitive impairment, pathological changes in the hippocampus and cerebral cortex, increased BBB permeability, inhibited Mfsd2a expression and up-regulated Cav-1 expression. After RgpA stimulation, the permeability of the BBB model in vitro increased, and the Ddx3x/Mfsd2a/Cav-1 regulatory axis was activated. CONCLUSIONS Gingipains may be one of the key virulence factors of P. gingivalis to impair cognition and enhance BBB permeability by the Ddx3x/Mfsd2a/Cav-1 axis.
Collapse
Affiliation(s)
- Fulong Li
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
- Center of Implantology, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Chunliang Ma
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Shuang Lei
- Department of Pediatric Dentistry, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Yaping Pan
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Li Lin
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Chunling Pan
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Qian Li
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Fengxue Geng
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| | - Dongyu Min
- Traditional Chinese Medicine Experimental Center, Affiliated Hospital of Liaoning University of Traditional Chinese Medicine, Shenyang, China
- Key Laboratory of Ministry of Education for TCM Viscera State Theory and Applications, Liaoning University of Traditional Chinese Medicine, Shenyang, China
| | - Xiaolin Tang
- Department of Periodontics, School and Hospital of Stomatology, Liaoning Provincial Key Laboratory of Oral Disease, China Medical University, Shenyang, China
| |
Collapse
|
14
|
Ali M, Shah D, Qazi S, Khan IA, Abrar M, Zahir S. An effective deep learning-based approach for splice site identification in gene expression. Sci Prog 2024; 107:368504241266588. [PMID: 39051530 PMCID: PMC11273556 DOI: 10.1177/00368504241266588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
A crucial stage in eukaryote gene expression involves mRNA splicing by a protein assembly known as the spliceosome. This step significantly contributes to generating and properly operating the ultimate gene product. Since non-coding introns disrupt eukaryotic genes, splicing entails the elimination of introns and joining exons to create a functional mRNA molecule. Nevertheless, accurately finding splice sequence sites using various molecular biology techniques and other biological approaches is complex and time-consuming. This paper presents a precise and reliable computer-aided diagnosis (CAD) technique for the rapid and correct identification of splice site sequences. The proposed deep learning-based framework uses long short-term memory (LSTM) to extract distinct patterns from RNA sequences, enabling rapid and accurate point mutation sequence mapping. The proposed network employs one-hot encodings to find sequential patterns that effectively identify splicing sites. A thorough ablation study of traditional machine learning, one-dimensional convolutional neural networks (1D-CNNs), and recurrent neural networks (RNNs) models was conducted. The proposed LSTM network outperformed existing state-of-the-art approaches, improving accuracy by 3% and 2% for the acceptor and donor sites datasets.
Collapse
Affiliation(s)
- Mohsin Ali
- Department of Computer Science, Bacha Khan University, Charsadda, KP, Pakistan
| | - Dilawar Shah
- Department of Computer Science, Bacha Khan University, Charsadda, KP, Pakistan
| | - Shahid Qazi
- Department of Computer Science, Bacha Khan University, Charsadda, KP, Pakistan
| | - Izaz Ahmad Khan
- Department of Computer Science, Bacha Khan University, Charsadda, KP, Pakistan
| | - Mohammad Abrar
- Faculty of Computer Science, Arab Open University, Muscat, Oman, Sultanate of Oman
| | - Sana Zahir
- Institute of Computer Sciences and Information Technology, The University of Agriculture Peshawar, Peshawar, KP, Pakistan
| |
Collapse
|
15
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
16
|
Rennie S. Deep Learning for Elucidating Modifications to RNA-Status and Challenges Ahead. Genes (Basel) 2024; 15:629. [PMID: 38790258 PMCID: PMC11121098 DOI: 10.3390/genes15050629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/11/2024] [Accepted: 05/11/2024] [Indexed: 05/26/2024] Open
Abstract
RNA-binding proteins and chemical modifications to RNA play vital roles in the co- and post-transcriptional regulation of genes. In order to fully decipher their biological roles, it is an essential task to catalogue their precise target locations along with their preferred contexts and sequence-based determinants. Recently, deep learning approaches have significantly advanced in this field. These methods can predict the presence or absence of modification at specific genomic regions based on diverse features, particularly sequence and secondary structure, allowing us to decipher the highly non-linear sequence patterns and structures that underlie site preferences. This article provides an overview of how deep learning is being applied to this area, with a particular focus on the problem of mRNA-RBP binding, while also considering other types of chemical modification to RNA. It discusses how different types of model can handle sequence-based and/or secondary-structure-based inputs, the process of model training, including choice of negative regions and separating sets for testing and training, and offers recommendations for developing biologically relevant models. Finally, it highlights four key areas that are crucial for advancing the field.
Collapse
Affiliation(s)
- Sarah Rennie
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| |
Collapse
|
17
|
Park JH, Prasad V, Newsom S, Najar F, Rajan R. IdMotif: An Interactive Motif Identification in Protein Sequences. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2024; 44:114-125. [PMID: 38127603 DOI: 10.1109/mcg.2023.3345742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
This article presents a visual analytics framework, idMotif, to support domain experts in identifying motifs in protein sequences. A motif is a short sequence of amino acids usually associated with distinct functions of a protein, and identifying similar motifs in protein sequences helps us to predict certain types of disease or infection. idMotif can be used to explore, analyze, and visualize such motifs in protein sequences. We introduce a deep-learning-based method for grouping protein sequences and allow users to discover motif candidates of protein groups based on local explanations of the decision of a deep-learning model. idMotif provides several interactive linked views for between and within protein cluster/group and sequence analysis. Through a case study and experts' feedback, we demonstrate how the framework helps domain experts analyze protein sequences and motif identification.
Collapse
|
18
|
Wu H, Liu X, Fang Y, Yang Y, Huang Y, Pan X, Shen HB. Decoding protein binding landscape on circular RNAs with base-resolution transformer models. Comput Biol Med 2024; 171:108175. [PMID: 38402841 DOI: 10.1016/j.compbiomed.2024.108175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 01/16/2024] [Accepted: 02/18/2024] [Indexed: 02/27/2024]
Abstract
Circular RNAs (circRNAs), a class of endogenous RNA with a covalent loop structure, can regulate gene expression by serving as sponges for microRNAs and RNA-binding proteins (RBPs). To date, most computational methods for predicting RBP binding sites on circRNAs focus on circRNA fragments instead of circRNAs. These methods detect whether a circRNA fragment contains binding sites, but cannot determine where are the binding sites and how many binding sites are on the circRNA transcript. We report a hybrid deep learning-based tool, CircSite, to predict RBP binding sites at single-nucleotide resolution and detect key contributed nucleotides on circRNA transcripts. CircSite takes advantage of convolutional neural networks (CNNs) and Transformer for learning local and global representations of circRNAs binding to RBPs, respectively. We construct 37 datasets of circRNAs interacting with proteins for benchmarking and the experimental results show that CircSite offers accurate predictions of RBP binding nucleotides and detects key subsequences aligning well with known binding motifs. CircSite is an easy-to-use online webserver for predicting RBP binding sites on circRNA transcripts and freely available at http://www.csbio.sjtu.edu.cn/bioinf/CircSite/.
Collapse
Affiliation(s)
- Hehe Wu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaojian Liu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Yi Fang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Yang Yang
- Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yan Huang
- State Key Laboratory of Infrared Physics, Shanghai Institute of Technical Physics Chinese Academy of Sciences, 500 Yutian Road, Shanghai, 200083, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
19
|
Zhang S, Li YD, Cai YR, Kang XP, Feng Y, Li YC, Chen YH, Li J, Bao LL, Jiang T. Compositional features analysis by machine learning in genome represents linear adaptation of monkeypox virus. Front Genet 2024; 15:1361952. [PMID: 38495668 PMCID: PMC10940399 DOI: 10.3389/fgene.2024.1361952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/21/2024] [Indexed: 03/19/2024] Open
Abstract
Introduction: The global headlines have been dominated by the sudden and widespread outbreak of monkeypox, a rare and endemic zoonotic disease caused by the monkeypox virus (MPXV). Genomic composition based machine learning (ML) methods have recently shown promise in identifying host adaptability and evolutionary patterns of virus. Our study aimed to analyze the genomic characteristics and evolutionary patterns of MPXV using ML methods. Methods: The open reading frame (ORF) regions of full-length MPXV genomes were filtered and 165 ORFs were selected as clusters with the highest homology. Unsupervised machine learning methods of t-distributed stochastic neighbor embedding (t-SNE), Principal Component Analysis (PCA), and hierarchical clustering were performed to observe the DCR characteristics of the selected ORF clusters. Results: The results showed that MPXV sequences post-2022 showed an obvious linear adaptive evolution, indicating that it has become more adapted to the human host after accumulating mutations. For further accurate analysis, the ORF regions with larger variations were filtered out based on the ranking of homology difference to narrow down the key ORF clusters, which drew the same conclusion of linear adaptability. Then key differential protein structures were predicted by AlphaFold 2, which meant that difference in main domains might be one of the internal reasons for linear adaptive evolution. Discussion: Understanding the process of linear adaptation is critical in the constant evolutionary struggle between viruses and their hosts, playing a significant role in crafting effective measures to tackle viral diseases. Therefore, the present study provides valuable insights into the evolutionary patterns of the MPXV in 2022 from the perspective of genomic composition characteristics analysis through ML methods.
Collapse
Affiliation(s)
- Sen Zhang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Ya-Dan Li
- College of Basic Medical Sciences, Anhui Medical University, Hefei, China
| | - Yu-Rong Cai
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
- College of the First Clinical Medical, Inner Mongolia Medical University, Hohhot, China
| | - Xiao-Ping Kang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Ye Feng
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Yu-Chang Li
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Yue-Hong Chen
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Jing Li
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
- College of Basic Medical Sciences, Anhui Medical University, Hefei, China
| | - Li-Li Bao
- College of Basic Medical Sciences, Inner Mongolia Medical University, Hohhot, China
| | - Tao Jiang
- College of Basic Medical Sciences, Anhui Medical University, Hefei, China
| |
Collapse
|
20
|
Kwak IY, Kim BC, Lee J, Kang T, Garry DJ, Zhang J, Gong W. Proformer: a hybrid macaron transformer model predicts expression values from promoter sequences. BMC Bioinformatics 2024; 25:81. [PMID: 38378442 PMCID: PMC10877777 DOI: 10.1186/s12859-024-05645-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 01/08/2024] [Indexed: 02/22/2024] Open
Abstract
The breakthrough high-throughput measurement of the cis-regulatory activity of millions of randomly generated promoters provides an unprecedented opportunity to systematically decode the cis-regulatory logic that determines the expression values. We developed an end-to-end transformer encoder architecture named Proformer to predict the expression values from DNA sequences. Proformer used a Macaron-like Transformer encoder architecture, where two half-step feed forward (FFN) layers were placed at the beginning and the end of each encoder block, and a separable 1D convolution layer was inserted after the first FFN layer and in front of the multi-head attention layer. The sliding k-mers from one-hot encoded sequences were mapped onto a continuous embedding, combined with the learned positional embedding and strand embedding (forward strand vs. reverse complemented strand) as the sequence input. Moreover, Proformer introduced multiple expression heads with mask filling to prevent the transformer models from collapsing when training on relatively small amount of data. We empirically determined that this design had significantly better performance than the conventional design such as using the global pooling layer as the output layer for the regression task. These analyses support the notion that Proformer provides a novel method of learning and enhances our understanding of how cis-regulatory sequences determine the expression values.
Collapse
Affiliation(s)
- Il-Youp Kwak
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Byeong-Chan Kim
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Juhyun Lee
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Taein Kang
- Department of Applied Statistics, Chung‑Ang University, Seoul, Republic of Korea
| | - Daniel J Garry
- Cardiovascular Division, Department of Medicine, Lillehei Heart Institute, University of Minnesota, 2231 6th St SE, Minneapolis, MN, 55455, USA.
- Stem Cell Institute, University of Minnesota, Minneapolis, MN, 55455, USA.
- Paul and Sheila Wellstone Muscular Dystrophy Center, University of Minnesota, Minneapolis, MN, 55455, USA.
| | - Jianyi Zhang
- Department of Biomedical Engineering, The University of Alabama at Birmingham, Birmingham, AL, 35233, USA
| | - Wuming Gong
- Cardiovascular Division, Department of Medicine, Lillehei Heart Institute, University of Minnesota, 2231 6th St SE, Minneapolis, MN, 55455, USA.
| |
Collapse
|
21
|
Lim D, Baek C, Blanchette M. Graphylo: A deep learning approach for predicting regulatory DNA and RNA sites from whole-genome multiple alignments. iScience 2024; 27:109002. [PMID: 38362268 PMCID: PMC10867641 DOI: 10.1016/j.isci.2024.109002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/17/2023] [Accepted: 01/19/2024] [Indexed: 02/17/2024] Open
Abstract
This study focuses on enhancing the prediction of regulatory functional sites in DNA and RNA sequences, a crucial aspect of gene regulation. Current methods, such as motif overrepresentation and machine learning, often lack specificity. To address this issue, the study leverages evolutionary information and introduces Graphylo, a deep-learning approach for predicting transcription factor binding sites in the human genome. Graphylo combines Convolutional Neural Networks for DNA sequences with Graph Convolutional Networks on phylogenetic trees, using information from placental mammals' genomes and evolutionary history. The research demonstrates that Graphylo consistently outperforms both single-species deep learning techniques and methods that incorporate inter-species conservation scores on a wide range of datasets. It achieves this by utilizing a species-based attention model for evolutionary insights and an integrated gradient approach for nucleotide-level model interpretability. This innovative approach offers a promising avenue for improving the accuracy of regulatory site prediction in genomics.
Collapse
|
22
|
Knudsen JE, Rich JM, Ma R. Artificial Intelligence in Pathomics and Genomics of Renal Cell Carcinoma. Urol Clin North Am 2024; 51:47-62. [PMID: 37945102 DOI: 10.1016/j.ucl.2023.06.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
The integration of artificial intelligence (AI) with histopathology images and gene expression patterns has led to the emergence of the dynamic fields of pathomics and genomics. These fields have revolutionized renal cell carcinoma (RCC) diagnosis and subtyping and improved survival prediction models. Machine learning has identified unique gene patterns across RCC subtypes and grades, providing insights into RCC origins and potential treatments, as targeted therapies. The combination of pathomics and genomics using AI opens new avenues in RCC research, promising future breakthroughs and innovations that patients and physicians can anticipate.
Collapse
Affiliation(s)
- J Everett Knudsen
- Catherine & Joseph Aresty Department of Urology, USC Institute of Urology, Center for Robotic Simulation & Education, University of Southern California, Los Angeles, CA, USA
| | - Joseph M Rich
- Catherine & Joseph Aresty Department of Urology, USC Institute of Urology, Center for Robotic Simulation & Education, University of Southern California, Los Angeles, CA, USA
| | - Runzhuo Ma
- Catherine & Joseph Aresty Department of Urology, USC Institute of Urology, Center for Robotic Simulation & Education, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
23
|
Cao C, Wang C, Yang S, Zou Q. CircSI-SSL: circRNA-binding site identification based on self-supervised learning. Bioinformatics 2024; 40:btae004. [PMID: 38180876 PMCID: PMC10789309 DOI: 10.1093/bioinformatics/btae004] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 11/13/2023] [Accepted: 01/03/2024] [Indexed: 01/07/2024] Open
Abstract
MOTIVATION In recent years, circular RNAs (circRNAs), the particular form of RNA with a closed-loop structure, have attracted widespread attention due to their physiological significance (they can directly bind proteins), leading to the development of numerous protein site identification algorithms. Unfortunately, these studies are supervised and require the vast majority of labeled samples in training to produce superior performance. But the acquisition of sample labels requires a large number of biological experiments and is difficult to obtain. RESULTS To resolve this matter that a great deal of tags need to be trained in the circRNA-binding site prediction task, a self-supervised learning binding site identification algorithm named CircSI-SSL is proposed in this article. According to the survey, this is unprecedented in the research field. Specifically, CircSI-SSL initially combines multiple feature coding schemes and employs RNA_Transformer for cross-view sequence prediction (self-supervised task) to learn mutual information from the multi-view data, and then fine-tuning with only a few sample labels. Comprehensive experiments on six widely used circRNA datasets indicate that our CircSI-SSL algorithm achieves excellent performance in comparison to previous algorithms, even in the extreme case where the ratio of training data to test data is 1:9. In addition, the transplantation experiment of six linRNA datasets without network modification and hyperparameter adjustment shows that CircSI-SSL has good scalability. In summary, the prediction algorithm based on self-supervised learning proposed in this article is expected to replace previous supervised algorithms and has more extensive application value. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/cc646201081/CircSI-SSL.
Collapse
Affiliation(s)
- Chao Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shuhong Yang
- Faculty of Mathematics and Computer Science, Guangdong Ocean University, Zhanjiang, Guangdong 524088, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| |
Collapse
|
24
|
Proft S, Leiz J, Heinemann U, Seelow D, Schmidt-Ott KM, Rutkiewicz M. Discovery of a non-canonical GRHL1 binding site using deep convolutional and recurrent neural networks. BMC Genomics 2023; 24:736. [PMID: 38049725 PMCID: PMC10696883 DOI: 10.1186/s12864-023-09830-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 11/22/2023] [Indexed: 12/06/2023] Open
Abstract
BACKGROUND Transcription factors regulate gene expression by binding to transcription factor binding sites (TFBSs). Most models for predicting TFBSs are based on position weight matrices (PWMs), which require a specific motif to be present in the DNA sequence and do not consider interdependencies of nucleotides. Novel approaches such as Transcription Factor Flexible Models or recurrent neural networks consequently provide higher accuracies. However, it is unclear whether such approaches can uncover novel non-canonical, hitherto unexpected TFBSs relevant to human transcriptional regulation. RESULTS In this study, we trained a convolutional recurrent neural network with HT-SELEX data for GRHL1 binding and applied it to a set of GRHL1 binding sites obtained from ChIP-Seq experiments from human cells. We identified 46 non-canonical GRHL1 binding sites, which were not found by a conventional PWM approach. Unexpectedly, some of the newly predicted binding sequences lacked the CNNG core motif, so far considered obligatory for GRHL1 binding. Using isothermal titration calorimetry, we experimentally confirmed binding between the GRHL1-DNA binding domain and predicted GRHL1 binding sites, including a non-canonical GRHL1 binding site. Mutagenesis of individual nucleotides revealed a correlation between predicted binding strength and experimentally validated binding affinity across representative sequences. This correlation was neither observed with a PWM-based nor another deep learning approach. CONCLUSIONS Our results show that convolutional recurrent neural networks may uncover unanticipated binding sites and facilitate quantitative transcription factor binding predictions.
Collapse
Affiliation(s)
- Sebastian Proft
- Exploratory Diagnostic Sciences, Berlin Institute of Health, Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353, Berlin, Germany
| | - Janna Leiz
- Department of Nephrology and Hypertension, Hannover Medical School, 30625, Hannover, Germany
- Department of Nephrology and Intensive Care Medicine, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, 12203, Berlin, Germany
- Molecular and Translational Kidney Research, Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Udo Heinemann
- Macromolecular Structure and Interaction, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany.
| | - Dominik Seelow
- Exploratory Diagnostic Sciences, Berlin Institute of Health, Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany.
- Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353, Berlin, Germany.
| | - Kai M Schmidt-Ott
- Department of Nephrology and Hypertension, Hannover Medical School, 30625, Hannover, Germany.
- Department of Nephrology and Intensive Care Medicine, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, 12203, Berlin, Germany.
- Molecular and Translational Kidney Research, Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany.
| | - Maria Rutkiewicz
- Macromolecular Structure and Interaction, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
- Department of Structural Biology of Eukaryotes, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznań, 61-704, Poland
| |
Collapse
|
25
|
Klie A, Laub D, Talwar JV, Stites H, Jores T, Solvason JJ, Farley EK, Carter H. Predictive analyses of regulatory sequences with EUGENe. NATURE COMPUTATIONAL SCIENCE 2023; 3:946-956. [PMID: 38177592 PMCID: PMC10768637 DOI: 10.1038/s43588-023-00544-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 09/27/2023] [Indexed: 01/06/2024]
Abstract
Deep learning has become a popular tool to study cis-regulatory function. Yet efforts to design software for deep-learning analyses in regulatory genomics that are findable, accessible, interoperable and reusable (FAIR) have fallen short of fully meeting these criteria. Here we present elucidating the utility of genomic elements with neural nets (EUGENe), a FAIR toolkit for the analysis of genomic sequences with deep learning. EUGENe consists of a set of modules and subpackages for executing the key functionality of a genomics deep learning workflow: (1) extracting, transforming and loading sequence data from many common file formats; (2) instantiating, initializing and training diverse model architectures; and (3) evaluating and interpreting model behavior. We designed EUGENe as a simple, flexible and extensible interface for streamlining and customizing end-to-end deep-learning sequence analyses, and illustrate these principles through application of the toolkit to three predictive modeling tasks. We hope that EUGENe represents a springboard towards a collaborative ecosystem for deep-learning applications in genomics research.
Collapse
Affiliation(s)
- Adam Klie
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - David Laub
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - James V Talwar
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | | | - Tobias Jores
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Joe J Solvason
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Department of Molecular Biology, University of California San Diego, La Jolla, CA, USA
| | - Emma K Farley
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Department of Molecular Biology, University of California San Diego, La Jolla, CA, USA
| | - Hannah Carter
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
26
|
Akbari Rokn Abadi S, Tabatabaei S, Koohi S. KDeep: a new memory-efficient data extraction method for accurately predicting DNA/RNA transcription factor binding sites. J Transl Med 2023; 21:727. [PMID: 37845681 PMCID: PMC10580661 DOI: 10.1186/s12967-023-04593-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 10/04/2023] [Indexed: 10/18/2023] Open
Abstract
This paper addresses the crucial task of identifying DNA/RNA binding sites, which has implications in drug/vaccine design, protein engineering, and cancer research. Existing methods utilize complex neural network structures, diverse input types, and machine learning techniques for feature extraction. However, the growing volume of sequences poses processing challenges. This study introduces KDeep, employing a CNN-LSTM architecture with a novel encoding method called 2Lk. 2Lk enhances prediction accuracy, reduces memory consumption by up to 84%, reduces trainable parameters, and improves interpretability by approximately 79% compared to state-of-the-art approaches. KDeep offers a promising solution for accurate and efficient binding site prediction.
Collapse
Affiliation(s)
| | | | - Somayyeh Koohi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
| |
Collapse
|
27
|
Vaculík O, Chalupová E, Grešová K, Majtner T, Alexiou P. Transfer Learning Allows Accurate RBP Target Site Prediction with Limited Sample Sizes. BIOLOGY 2023; 12:1276. [PMID: 37886986 PMCID: PMC10604046 DOI: 10.3390/biology12101276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/19/2023] [Accepted: 09/21/2023] [Indexed: 10/28/2023]
Abstract
RNA-binding proteins are vital regulators in numerous biological processes. Their disfunction can result in diverse diseases, such as cancer or neurodegenerative disorders, making the prediction of their binding sites of high importance. Deep learning (DL) has brought about a revolution in various biological domains, including the field of protein-RNA interactions. Nonetheless, several challenges persist, such as the limited availability of experimentally validated binding sites to train well-performing DL models for the majority of proteins. Here, we present a novel training approach based on transfer learning (TL) to address the issue of limited data. Employing a sophisticated and interpretable architecture, we compare the performance of our method trained using two distinct approaches: training from scratch (SCR) and utilizing TL. Additionally, we benchmark our results against the current state-of-the-art methods. Furthermore, we tackle the challenges associated with selecting appropriate input features and determining optimal interval sizes. Our results show that TL enhances model performance, particularly in datasets with minimal training data, where satisfactory results can be achieved with just a few hundred RNA binding sites. Moreover, we demonstrate that integrating both sequence and evolutionary conservation information leads to superior performance. Additionally, we showcase how incorporating an attention layer into the model facilitates the interpretation of predictions within a biologically relevant context.
Collapse
Affiliation(s)
- Ondřej Vaculík
- Central European Institute of Technology (CEITEC), Masaryk University, 625 00 Brno, Czech Republic
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, 625 00 Brno, Czech Republic
| | - Eliška Chalupová
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, 625 00 Brno, Czech Republic
| | - Katarína Grešová
- Central European Institute of Technology (CEITEC), Masaryk University, 625 00 Brno, Czech Republic
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, 625 00 Brno, Czech Republic
| | - Tomáš Majtner
- Central European Institute of Technology (CEITEC), Masaryk University, 625 00 Brno, Czech Republic
- Department of Molecular Sociology, Max Planck Institute of Biophysics, 60439 Frankfurt am Main, Germany
| | - Panagiotis Alexiou
- Central European Institute of Technology (CEITEC), Masaryk University, 625 00 Brno, Czech Republic
- Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, MSD 2080 Msida, Malta
- Centre for Molecular Medicine & Biobanking, University of Malta, MSD 2080 Msida, Malta
| |
Collapse
|
28
|
Horlacher M, Cantini G, Hesse J, Schinke P, Goedert N, Londhe S, Moyon L, Marsico A. A systematic benchmark of machine learning methods for protein-RNA interaction prediction. Brief Bioinform 2023; 24:bbad307. [PMID: 37635383 PMCID: PMC10516373 DOI: 10.1093/bib/bbad307] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/15/2023] [Accepted: 07/18/2023] [Indexed: 08/29/2023] Open
Abstract
RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation. Experiments to profile-binding sites of RBPs in vivo are limited to transcripts expressed in the experimental cell type, creating the need for computational methods to infer missing binding information. While numerous machine-learning based methods have been developed for this task, their use of heterogeneous training and evaluation datasets across different sets of RBPs and CLIP-seq protocols makes a direct comparison of their performance difficult. Here, we compile a set of 37 machine learning (primarily deep learning) methods for in vivo RBP-RNA interaction prediction and systematically benchmark a subset of 11 representative methods across hundreds of CLIP-seq datasets and RBPs. Using homogenized sample pre-processing and two negative-class sample generation strategies, we evaluate methods in terms of predictive performance and assess the impact of neural network architectures and input modalities on model performance. We believe that this study will not only enable researchers to choose the optimal prediction method for their tasks at hand, but also aid method developers in developing novel, high-performing methods by introducing a standardized framework for their evaluation.
Collapse
Affiliation(s)
- Marc Horlacher
- Computational Health Center, Helmholtz Center Munich, Germany
- School of Computation, Information and Technology, Technical University Munich (TUM), Germany
| | - Giulia Cantini
- Computational Health Center, Helmholtz Center Munich, Germany
| | - Julian Hesse
- Computational Health Center, Helmholtz Center Munich, Germany
| | - Patrick Schinke
- Computational Health Center, Helmholtz Center Munich, Germany
| | - Nicolas Goedert
- Computational Health Center, Helmholtz Center Munich, Germany
| | | | - Lambert Moyon
- Computational Health Center, Helmholtz Center Munich, Germany
| | | |
Collapse
|
29
|
Jiang L, Xiao M, Liao QQ, Zheng L, Li C, Liu Y, Yang B, Ren A, Jiang C, Feng XH. High-sensitivity profiling of SARS-CoV-2 noncoding region-host protein interactome reveals the potential regulatory role of negative-sense viral RNA. mSystems 2023; 8:e0013523. [PMID: 37314180 PMCID: PMC10469612 DOI: 10.1128/msystems.00135-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 04/11/2023] [Indexed: 06/15/2023] Open
Abstract
A deep understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-host interactions is crucial to developing effective therapeutics and addressing the threat of emerging coronaviruses. The role of noncoding regions of viral RNA (ncrRNAs) has yet to be systematically scrutinized. We developed a method using MS2 affinity purification coupled with liquid chromatography-mass spectrometry and designed a diverse set of bait ncrRNAs to systematically map the interactome of SARS-CoV-2 ncrRNA in Calu-3, Huh7, and HEK293T cells. Integration of the results defined the core ncrRNA-host protein interactomes among cell lines. The 5' UTR interactome is enriched with proteins in the small nuclear ribonucleoproteins family and is a target for the regulation of viral replication and transcription. The 3' UTR interactome is enriched with proteins involved in the stress granules and heterogeneous nuclear ribonucleoproteins family. Intriguingly, compared with the positive-sense ncrRNAs, the negative-sense ncrRNAs, especially the negative-sense of 3' UTR, interacted with a large array of host proteins across all cell lines. These proteins are involved in the regulation of the viral production process, host cell apoptosis, and immune response. Taken together, our study depicts the comprehensive landscape of the SARS-CoV-2 ncrRNA-host protein interactome and unveils the potential regulatory role of the negative-sense ncrRNAs, providing a new perspective on virus-host interactions and the design of future therapeutics. Given the highly conserved nature of UTRs in positive-strand viruses, the regulatory role of negative-sense ncrRNAs should not be exclusive to SARS-CoV-2. IMPORTANCE Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes COVID-19, a pandemic affecting millions of lives. During replication and transcription, noncoding regions of the viral RNA (ncrRNAs) may play an important role in the virus-host interactions. Understanding which and how these ncrRNAs interact with host proteins is crucial for understanding the mechanism of SARS-CoV-2 pathogenesis. We developed the MS2 affinity purification coupled with liquid chromatography-mass spectrometry method and designed a diverse set of ncrRNAs to identify the SARS-CoV-2 ncrRNA interactome comprehensively in different cell lines and found that the 5' UTR binds to proteins involved in U1 small nuclear ribonucleoprotein, while the 3' UTR interacts with proteins involved in stress granules and the heterogeneous nuclear ribonucleoprotein family. Interestingly, negative-sense ncrRNAs showed interactions with a large number of diverse host proteins, indicating a crucial role in infection. The results demonstrate that ncrRNAs could serve diverse regulatory functions.
Collapse
Affiliation(s)
- Liuyiqi Jiang
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
- Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Mu Xiao
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
- Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
- The MOE Key Laboratory of Biosystems Homeostasis & Protection and Innovation Center for Cell Signaling Network, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Qing-Qing Liao
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Luqian Zheng
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Chunyan Li
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yuemei Liu
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
- Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
- The MOE Key Laboratory of Biosystems Homeostasis & Protection and Innovation Center for Cell Signaling Network, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Bing Yang
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Aiming Ren
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Chao Jiang
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
- Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xin-Hua Feng
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
- Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
- The MOE Key Laboratory of Biosystems Homeostasis & Protection and Innovation Center for Cell Signaling Network, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| |
Collapse
|
30
|
Horlacher M, Wagner N, Moyon L, Kuret K, Goedert N, Salvatore M, Ule J, Gagneur J, Winther O, Marsico A. Towards in silico CLIP-seq: predicting protein-RNA interaction via sequence-to-signal learning. Genome Biol 2023; 24:180. [PMID: 37542318 PMCID: PMC10403857 DOI: 10.1186/s13059-023-03015-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 07/17/2023] [Indexed: 08/06/2023] Open
Abstract
We present RBPNet, a novel deep learning method, which predicts CLIP-seq crosslink count distribution from RNA sequence at single-nucleotide resolution. By training on up to a million regions, RBPNet achieves high generalization on eCLIP, iCLIP and miCLIP assays, outperforming state-of-the-art classifiers. RBPNet performs bias correction by modeling the raw signal as a mixture of the protein-specific and background signal. Through model interrogation via Integrated Gradients, RBPNet identifies predictive sub-sequences that correspond to known and novel binding motifs and enables variant-impact scoring via in silico mutagenesis. Together, RBPNet improves imputation of protein-RNA interactions, as well as mechanistic interpretation of predictions.
Collapse
Affiliation(s)
- Marc Horlacher
- Computational Health Center, Helmholtz Center Munich, Munich, Germany.
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
- Department of Informatics, Technical University of Munich, Garching, Germany.
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany.
| | - Nils Wagner
- Department of Informatics, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Lambert Moyon
- Computational Health Center, Helmholtz Center Munich, Munich, Germany
| | - Klara Kuret
- National Institute of Chemistry, Ljubljana, Slovenia
- The Francis Crick Institute, London, UK
- Jozef Stefan International Postgraduate School, Jamova cesta 39, 1000, Ljubljana, Slovenia
| | - Nicolas Goedert
- Computational Health Center, Helmholtz Center Munich, Munich, Germany
| | - Marco Salvatore
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jernej Ule
- National Institute of Chemistry, Ljubljana, Slovenia
- The Francis Crick Institute, London, UK
| | - Julien Gagneur
- Computational Health Center, Helmholtz Center Munich, Munich, Germany
- Department of Informatics, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Ole Winther
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Annalisa Marsico
- Computational Health Center, Helmholtz Center Munich, Munich, Germany.
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany.
| |
Collapse
|
31
|
Li Q, Kang C. Targeting RNA-binding proteins with small molecules: Perspectives, pitfalls and bifunctional molecules. FEBS Lett 2023; 597:2031-2047. [PMID: 37519019 DOI: 10.1002/1873-3468.14710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 06/26/2023] [Accepted: 06/28/2023] [Indexed: 08/01/2023]
Abstract
RNA-binding proteins (RBPs) play vital roles in organisms through binding with RNAs to regulate their functions. Small molecules affecting the function of RBPs have been developed, providing new avenues for drug discovery. Herein, we describe the perspectives on developing small molecule regulators of RBPs. The following types of small molecule modulators are of great interest in drug discovery: small molecules binding to RBPs to affect interactions with RNA molecules, bifunctional molecules binding to RNA or RBP to influence their interactions, and other types of molecules that affect the stability of RNA or RBPs. Moreover, we emphasize that the bifunctional molecules may play important roles in small molecule development to overcome the challenges encountered in the process of drug discovery.
Collapse
Affiliation(s)
- Qingxin Li
- Guangdong Provincial Engineering Laboratory of Biomass High Value Utilization, Institute of Biological and Medical Engineering, Guangdong Academy of Sciences, Guangzhou, China
| | - Congbao Kang
- Experimental Drug Development Centre, Agency for Science, Technology and Research, Singapore, Singapore
| |
Collapse
|
32
|
Liu D, Lin Z, Jia C. NeuroCNN_GNB: an ensemble model to predict neuropeptides based on a convolution neural network and Gaussian naive Bayes. Front Genet 2023; 14:1226905. [PMID: 37576553 PMCID: PMC10414792 DOI: 10.3389/fgene.2023.1226905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 06/30/2023] [Indexed: 08/15/2023] Open
Abstract
Neuropeptides contain more chemical information than other classical neurotransmitters and have multiple receptor recognition sites. These characteristics allow neuropeptides to have a correspondingly higher selectivity for nerve receptors and fewer side effects. Traditional experimental methods, such as mass spectrometry and liquid chromatography technology, still need the support of a complete neuropeptide precursor database and the basic characteristics of neuropeptides. Incomplete neuropeptide precursor and information databases will lead to false-positives or reduce the sensitivity of recognition. In recent years, studies have proven that machine learning methods can rapidly and effectively predict neuropeptides. In this work, we have made a systematic attempt to create an ensemble tool based on four convolution neural network models. These baseline models were separately trained on one-hot encoding, AAIndex, G-gap dipeptide encoding and word2vec and integrated using Gaussian Naive Bayes (NB) to construct our predictor designated NeuroCNN_GNB. Both 5-fold cross-validation tests using benchmark datasets and independent tests showed that NeuroCNN_GNB outperformed other state-of-the-art methods. Furthermore, this novel framework provides essential interpretations that aid the understanding of model success by leveraging the powerful Shapley Additive exPlanation (SHAP) algorithm, thereby highlighting the most important features relevant for predicting neuropeptides.
Collapse
Affiliation(s)
- Di Liu
- Information Science and Technology College, Dalian Maritime University, Dalian, China
| | - Zhengkui Lin
- Information Science and Technology College, Dalian Maritime University, Dalian, China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian, China
| |
Collapse
|
33
|
Farhadi F, Allahbakhsh M, Maghsoudi A, Armin N, Amintoosi H. DiMo: discovery of microRNA motifs using deep learning and motif embedding. Brief Bioinform 2023; 24:bbad182. [PMID: 37165972 DOI: 10.1093/bib/bbad182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 04/17/2023] [Accepted: 04/21/2023] [Indexed: 05/12/2023] Open
Abstract
MicroRNAs are small regulatory RNAs that decrease gene expression after transcription in various biological disciplines. In bioinformatics, identifying microRNAs and predicting their functionalities is critical. Finding motifs is one of the most well-known and important methods for identifying the functionalities of microRNAs. Several motif discovery techniques have been proposed, some of which rely on artificial intelligence-based techniques. However, in the case of few or no training data, their accuracy is low. In this research, we propose a new computational approach, called DiMo, for identifying motifs in microRNAs and generally macromolecules of small length. We employ word embedding techniques and deep learning models to improve the accuracy of motif discovery results. Also, we rely on transfer learning models to pre-train a model and use it in cases of a lack of (enough) training data. We compare our approach with five state-of-the-art works using three real-world datasets. DiMo outperforms the selected related works in terms of precision, recall, accuracy and f1-score.
Collapse
Affiliation(s)
- Fatemeh Farhadi
- Department of Bioinformatics, University of Zabol, Zabol, Iran
| | | | - Ali Maghsoudi
- Department of Bioinformatics, University of Zabol, Zabol, Iran
| | - Nadieh Armin
- Computer Engineering Department, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Haleh Amintoosi
- Computer Engineering Department, Ferdowsi University of Mashhad, Mashhad, Iran
| |
Collapse
|
34
|
Dunham AS, Beltrao P, AlQuraishi M. High-throughput deep learning variant effect prediction with Sequence UNET. Genome Biol 2023; 24:110. [PMID: 37161576 PMCID: PMC10169183 DOI: 10.1186/s13059-023-02948-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 04/20/2023] [Indexed: 05/11/2023] Open
Abstract
Understanding coding mutations is important for many applications in biology and medicine but the vast mutation space makes comprehensive experimental characterisation impossible. Current predictors are often computationally intensive and difficult to scale, including recent deep learning models. We introduce Sequence UNET, a highly scalable deep learning architecture that classifies and predicts variant frequency from sequence alone using multi-scale representations from a fully convolutional compression/expansion architecture. It achieves comparable pathogenicity prediction to recent methods. We demonstrate scalability by analysing 8.3B variants in 904,134 proteins detected through large-scale proteomics. Sequence UNET runs on modest hardware with a simple Python package.
Collapse
Affiliation(s)
- Alistair S Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1RQ, UK.
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093, Zurich, Switzerland
| | | |
Collapse
|
35
|
Xu Y, Zhu J, Huang W, Xu K, Yang R, Zhang QC, Sun L. PrismNet: predicting protein-RNA interaction using in vivo RNA structural information. Nucleic Acids Res 2023:7151359. [PMID: 37140045 DOI: 10.1093/nar/gkad353] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/13/2023] [Accepted: 04/26/2023] [Indexed: 05/05/2023] Open
Abstract
Fundamental to post-transcriptional regulation, the in vivo binding of RNA binding proteins (RBPs) on their RNA targets heavily depends on RNA structures. To date, most methods for RBP-RNA interaction prediction are based on RNA structures predicted from sequences, which do not consider the various intracellular environments and thus cannot predict cell type-specific RBP-RNA interactions. Here, we present a web server PrismNet that uses a deep learning tool to integrate in vivo RNA secondary structures measured by icSHAPE experiments with RBP binding site information from UV cross-linking and immunoprecipitation in the same cell lines to predict cell type-specific RBP-RNA interactions. Taking an RBP and an RNA region with sequential and structural information as input ('Sequence & Structure' mode), PrismNet outputs the binding probability of the RBP and this RNA region, together with a saliency map and a sequence-structure integrative motif. The web server is freely available at http://prismnetweb.zhanglab.net.
Collapse
Affiliation(s)
- Yiran Xu
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Jianghui Zhu
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Wenze Huang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Kui Xu
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Rui Yang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Qiangfeng Cliff Zhang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Lei Sun
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Shandong University, Qingdao 266237, China
| |
Collapse
|
36
|
Du X, Zhou P, Zhang H, Peng H, Mao X, Liu S, Xu W, Feng K, Zhang Y. Downregulated liver-elevated long intergenic noncoding RNA (LINC02428) is a tumor suppressor that blocks KDM5B/IGF2BP1 positive feedback loop in hepatocellular carcinoma. Cell Death Dis 2023; 14:301. [PMID: 37137887 PMCID: PMC10156739 DOI: 10.1038/s41419-023-05831-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 03/30/2023] [Accepted: 04/24/2023] [Indexed: 05/05/2023]
Abstract
Hepatocellular carcinoma (HCC) is a common malignant tumor with high mortality and poor prognoses worldwide. Many studies have reported that long noncoding RNAs (lncRNAs) are related to the progression and prognosis of HCC. However, the functions of downregulated liver-elevated (LE) lncRNAs in HCC remain elusive. Here we report the roles and mechanisms of downregulated LE LINC02428 in HCC. Downregulated LE lncRNAs played significant roles in HCC genesis and development. LINC02428 was upregulated in liver tissues compared with other normal tissues and showed low expression in HCC. The low expression of LINC02428 was attributed to poor HCC prognosis. Overexpressed LINC02428 suppressed the proliferation and metastasis of HCC in vitro and in vivo. LINC02428 was predominantly located in the cytoplasm and bound to insulin-like growth factor-2 mRNA-binding protein 1 (IGF2BP1) to prevent it from binding to lysine demethylase 5B (KDM5B) mRNA, which decreased the stability of KDM5B mRNA. KDM5B was found to preferentially bind to the promoter region of IGF2BP1 to upregulate its transcription. Therefore, LINC02428 interrupts the KDM5B/IGF2BP1 positive feedback loops to inhibit HCC progression. The KDM5B/IGF2BP1 positive feedback loop is involved in tumorigenesis and progression of HCC.
Collapse
Affiliation(s)
- Xuanlong Du
- School of Medicine, Southeast University, Nanjing, 210009, China
| | - Pengcheng Zhou
- School of Medicine, Southeast University, Nanjing, 210009, China
| | - Haidong Zhang
- School of Medicine, Southeast University, Nanjing, 210009, China
| | - Hao Peng
- School of Medicine, Southeast University, Nanjing, 210009, China
| | - Xinyu Mao
- Hepatopancreatobiliary Center, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, 210011, China
| | - Shiwei Liu
- School of Medicine, Southeast University, Nanjing, 210009, China
| | - Wenjing Xu
- School of Medicine, Southeast University, Nanjing, 210009, China
| | - Kun Feng
- Hepatopancreatobiliary Center, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, 210011, China
| | - Yewei Zhang
- Hepatopancreatobiliary Center, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, 210011, China.
| |
Collapse
|
37
|
Chadha A, Dara R, Pearl DL, Sharif S, Poljak Z. Predictive analysis for pathogenicity classification of H5Nx avian influenza strains using machine learning techniques. Prev Vet Med 2023; 216:105924. [PMID: 37224663 DOI: 10.1016/j.prevetmed.2023.105924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 03/17/2023] [Accepted: 04/21/2023] [Indexed: 05/26/2023]
Abstract
Over the past decades, avian influenza (AI) outbreaks have been reported across different parts of the globe, resulting in large-scale economic and livestock loss and, in some cases raising concerns about their zoonotic potential. The virulence and pathogenicity of H5Nx (e.g., H5N1, H5N2) AI strains for poultry could be inferred through various approaches, and it has been frequently performed by detecting certain pathogenicity markers in their haemagglutinin (HA) gene. The utilization of predictive modeling methods represents a possible approach to exploring this genotypic-phenotypic relationship for assisting experts in determining the pathogenicity of circulating AI viruses. Therefore, the main objective of this study was to evaluate the predictive performance of different machine learning (ML) techniques for in-silico prediction of pathogenicity of H5Nx viruses in poultry, using complete genetic sequences of the HA gene. We annotated 2137 H5Nx HA gene sequences based on the presence of the polybasic HA cleavage site (HACS) with 46.33% and 53.67% of sequences previously identified as highly pathogenic (HP) and low pathogenic (LP), respectively. We compared the performance of different ML classifiers (e.g., logistic regression (LR) with the lasso and ridge regularization, random forest (RF), K-nearest neighbor (KNN), Naïve Bayes (NB), support vector machine (SVM), and convolutional neural network (CNN)) for pathogenicity classification of raw H5Nx nucleotide and protein sequences using a 10-fold cross-validation technique. We found that different ML techniques can be successfully used for the pathogenicity classification of H5 sequences with ∼99% classification accuracy. Our results indicate that for pathogenicity classification of (1) aligned deoxyribonucleic acid (DNA) and protein sequences, with NB classifier had the lowest accuracies of 98.41% (+/-0.89) and 98.31% (+/-1.06), respectively; (2) aligned DNA and protein sequences, with LR (L1/L2), KNN, SVM (radial basis function (RBF)) and CNN classifiers had the highest accuracies of 99.20% (+/-0.54) and 99.20% (+/-0.38), respectively; (3) unaligned DNA and protein sequences, with CNN's achieved accuracies of 98.54% (+/-0.68) and 99.20% (+/-0.50), respectively. ML methods show potential for regular classification of H5Nx virus pathogenicity for poultry species, particularly when sequences containing regular markers were frequently present in the training dataset.
Collapse
Affiliation(s)
- Akshay Chadha
- School of Computer Science, University of Guelph, Guelph, Ontario N1G 2W1, Canada.
| | - Rozita Dara
- School of Computer Science, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| | - David L Pearl
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| | - Shayan Sharif
- Department of Pathobiology, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| | - Zvonimir Poljak
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| |
Collapse
|
38
|
Pan Z, Zhou S, Zou H, Liu C, Zang M, Liu T, Wang Q. MCNN: Multiple Convolutional Neural Networks for RNA-Protein Binding Sites Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1180-1187. [PMID: 35471886 DOI: 10.1109/tcbb.2022.3170367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Computational prediction of the RBP bound sites using features learned from existing annotation knowledge is an effective method because high-throughput experiments are complex, expensive and time-consuming. Many methods have been proposed to predict RNA-protein binding sites. However, the partial information of RNA sequence is not fully used. In this study, we propose multiple convolutional neural networks (MCNN) method, which predicts RNA-protein binding sites by integrating multiple convolutional neural networks constructed by RNA sequence information extracted from windows with different lengths. First, MCNN trains multiple CNNs base on RNA sequences extracted by different window lengths. Second, MCNN can extract more binding patterns of RBPs by combining these trained multiple CNNs previously. Third, MCNN only uses RNA base sequence information for RNA-protein binding sites prediction, which extracts sequence binding features and predicts the result with same architecture. This avoids the information loss of feature extraction step. Our proposed MCNN demonstrates a competitive performance comparing with other methods on a large-scale dataset derived from CLIP-seq, which is an effective method for RNA-protein binding sites prediction. The source code of our proposed MCNN method can be found in https://github.com/biomg/MCNN.
Collapse
|
39
|
Wang X, Zhang M, Long C, Yao L, Zhu M. Self-Attention Based Neural Network for Predicting RNA-Protein Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1469-1479. [PMID: 36067103 DOI: 10.1109/tcbb.2022.3204661] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Proteins binding to Ribonucleic Acid (RNA) inside cells are called RNA-binding proteins (RBP), which play a crucial role in gene regulation. The identification of RNA-protein binding sites helps to understand the function of RBP better. Although many computational methods have been developed to predict RNA-protein binding sites, their prediction accuracy on small sample datasets needs improvement. To overcome this limitation, we propose a novel model called SA-Net, which utilizes k-mer embedding to encode RNA sequences and a self-attention-based neural network to extract sequence features. K-mer embedding assists the model to discover significant subsequence fragments associated with binding sites. The self-attention mechanism captures contextual information from the entire input sequence globally, performing well in small sample sequence learning. Experimental results demonstrate that SA-Net attains state-of-the-art results on the RBP-24 dataset. We find that 4-mer embedding aids the model to achieve optimal performance. We also show that the self-attention network outperforms the commonly used CNN and CNN-BLSTM models in sequence feature extraction.
Collapse
|
40
|
Patiyal S, Dhall A, Bajaj K, Sahu H, Raghava GPS. Prediction of RNA-interacting residues in a protein using CNN and evolutionary profile. Brief Bioinform 2023; 24:6901899. [PMID: 36516298 DOI: 10.1093/bib/bbac538] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 09/28/2022] [Accepted: 11/08/2022] [Indexed: 12/15/2022] Open
Abstract
This paper describes a method Pprint2, which is an improved version of Pprint developed for predicting RNA-interacting residues in a protein. Training and independent/validation datasets used in this study comprises of 545 and 161 non-redundant RNA-binding proteins, respectively. All models were trained on training dataset and evaluated on the validation dataset. The preliminary analysis reveals that positively charged amino acids such as H, R and K, are more prominent in the RNA-interacting residues. Initially, machine learning based models have been developed using binary profile and obtain maximum area under curve (AUC) 0.68 on validation dataset. The performance of this model improved significantly from AUC 0.68 to 0.76, when evolutionary profile is used instead of binary profile. The performance of our evolutionary profile-based model improved further from AUC 0.76 to 0.82, when convolutional neural network has been used for developing model. Our final model based on convolutional neural network using evolutionary information achieved AUC 0.82 with Matthews correlation coefficient of 0.49 on the validation dataset. Our best model outperforms existing methods when evaluated on the independent/validation dataset. A user-friendly standalone software and web-based server named 'Pprint2' has been developed for predicting RNA-interacting residues (https://webs.iiitd.edu.in/raghava/pprint2 and https://github.com/raghavagps/pprint2).
Collapse
Affiliation(s)
- Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Khushboo Bajaj
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Harshita Sahu
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| |
Collapse
|
41
|
He Y, Zhang Q, Wang S, Chen Z, Cui Z, Guo ZH, Huang DS. Predicting the Sequence Specificities of DNA-Binding Proteins by DNA Fine-Tuned Language Model With Decaying Learning Rates. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:616-624. [PMID: 35389869 DOI: 10.1109/tcbb.2022.3165592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
DNA-binding proteins (DBPs) play vital roles in the regulation of biological systems. Although there are already many deep learning methods for predicting the sequence specificities of DBPs, they face two challenges as follows. Classic deep learning methods for DBPs prediction usually fail to capture the dependencies between genomic sequences since their commonly used one-hot codes are mutually orthogonal. Besides, these methods usually perform poorly when samples are inadequate. To address these two challenges, we developed a novel language model for mining DBPs using human genomic data and ChIP-seq datasets with decaying learning rates, named DNA Fine-tuned Language Model (DFLM). It can capture the dependencies between genome sequences based on the context of human genomic data and then fine-tune the features of DBPs tasks using different ChIP-seq datasets. First, we compared DFLM with the existing widely used methods on 69 datasets and we achieved excellent performance. Moreover, we conducted comparative experiments on complex DBPs and small datasets. The results show that DFLM still achieved a significant improvement. Finally, through visualization analysis of one-hot encoding and DFLM, we found that one-hot encoding completely cut off the dependencies of DNA sequences themselves, while DFLM using language models can well represent the dependency of DNA sequences. Source code are available at: https://github.com/Deep-Bioinfo/DFLM.
Collapse
|
42
|
Roca-Martínez J, Dhondge H, Sattler M, Vranken WF. Deciphering the RRM-RNA recognition code: A computational analysis. PLoS Comput Biol 2023; 19:e1010859. [PMID: 36689472 PMCID: PMC9894542 DOI: 10.1371/journal.pcbi.1010859] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 02/02/2023] [Accepted: 01/07/2023] [Indexed: 01/24/2023] Open
Abstract
RNA recognition motifs (RRM) are the most prevalent class of RNA binding domains in eucaryotes. Their RNA binding preferences have been investigated for almost two decades, and even though some RRM domains are now very well described, their RNA recognition code has remained elusive. An increasing number of experimental structures of RRM-RNA complexes has become available in recent years. Here, we perform an in-depth computational analysis to derive an RNA recognition code for canonical RRMs. We present and validate a computational scoring method to estimate the binding between an RRM and a single stranded RNA, based on structural data from a carefully curated multiple sequence alignment, which can predict RRM binding RNA sequence motifs based on the RRM protein sequence. Given the importance and prevalence of RRMs in humans and other species, this tool could help design RNA binding motifs with uses in medical or synthetic biology applications, leading towards the de novo design of RRMs with specific RNA recognition.
Collapse
Affiliation(s)
- Joel Roca-Martínez
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- Structural biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
| | | | - Michael Sattler
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich, Neuherberg, Germany
- Bavarian NMR Center, Department of Bioscience, School of Natural Sciences, Technical University of Munich, Garching, Germany
| | - Wim F. Vranken
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- Structural biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
| |
Collapse
|
43
|
Noreldeen HAA, Huang KY, Wu GW, Zhang Q, Peng HP, Deng HH, Chen W. Feature Selection Assists BLSTM for the Ultrasensitive Detection of Bioflavonoids in Different Biological Matrices Based on the 3D Fluorescence Spectra of Gold Nanoclusters. Anal Chem 2022; 94:17533-17540. [PMID: 36473730 DOI: 10.1021/acs.analchem.2c03814] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Rapid and on-site qualitative and quantitative analysis of small molecules (including bioflavonoids) in biofluids are of great importance in biomedical applications. Herein, we have developed two deep learning models based on the 3D fluorescence spectra of gold nanoclusters as a single probe for rapid qualitative and quantitative analysis of eight bioflavonoids in serum. The results proved the efficiency and stability of the random forest-bidirectional long short-term memory (RF-BLSTM) model, which was used only with the most important features after deleting the unimportant features that might hinder the performance of the model in identifying the selected bioflavonoids in serum at very low concentrations. The optimized model achieves excellent overall accuracy (98-100%) in the qualitative analysis of the selected bioflavonoids. Next, the optimized model was transferred to quantify the selected bioflavonoids in serum at nanoscale concentrations. The transferred model achieved excellent accuracy, and the overall determination coefficient (R2) value range was 99-100%. Furthermore, the optimized model achieved excellent accuracies in other applications, including multiplex detection in serum and model applicability in urine. Also, LOD in serum at nanoscale concentration was considered. Therefore, this approach opens the window for qualitative and quantitative analysis of small molecules in biofluids at nanoscale concentrations, which may help in the rapid inclusion of sensor arrays in biomedical and other applications.
Collapse
Affiliation(s)
- Hamada A A Noreldeen
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350004, China.,National Institute of Oceanography and Fisheries, NIOF, Cairo 4262110, Egypt
| | - Kai-Yuan Huang
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350004, China
| | - Gang-Wei Wu
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350004, China.,Department of Pharmacy, Fujian Provincial Hospital, Fuzhou 350001, China
| | - Qi Zhang
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350004, China
| | - Hua-Ping Peng
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350004, China
| | - Hao-Hua Deng
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350004, China
| | - Wei Chen
- Fujian Key Laboratory of Drug Target Discovery and Structural and Functional Research, School of Pharmacy, Fujian Medical University, Fuzhou 350004, China
| |
Collapse
|
44
|
Involvement of circRNAs in the Development of Heart Failure. Int J Mol Sci 2022; 23:ijms232214129. [PMID: 36430607 PMCID: PMC9697219 DOI: 10.3390/ijms232214129] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 11/05/2022] [Accepted: 11/14/2022] [Indexed: 11/18/2022] Open
Abstract
In recent years, interest in non-coding RNAs as important physiological regulators has grown significantly. Their participation in the pathophysiology of cardiovascular diseases is extremely important. Circular RNA (circRNA) has been shown to be important in the development of heart failure. CircRNA is a closed circular structure of non-coding RNA fragments. They are formed in the nucleus, from where they are transported to the cytoplasm in a still unclear mechanism. They are mainly located in the cytoplasm or contained in exosomes. CircRNA expression varies according to the type of tissue. In the brain, almost 12% of genes produce circRNA, while in the heart it is only 9%. Recent studies indicate a key role of circRNA in cardiomyocyte hypertrophy, fibrosis, autophagy and apoptosis. CircRNAs act mainly by interacting with miRNAs through a "sponge effect" mechanism. The involvement of circRNA in the development of heart failure leads to the suggestion that they may be promising biomarkers and useful targets in the treatment of cardiovascular diseases. In this review, we will provide a brief introduction to circRNA and up-to-date understanding of their role in the mechanisms leading to the development of heart failure.
Collapse
|
45
|
Bheemireddy S, Sandhya S, Srinivasan N, Sowdhamini R. Computational tools to study RNA-protein complexes. Front Mol Biosci 2022; 9:954926. [PMID: 36275618 PMCID: PMC9585174 DOI: 10.3389/fmolb.2022.954926] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 09/20/2022] [Indexed: 11/19/2022] Open
Abstract
RNA is the key player in many cellular processes such as signal transduction, replication, transport, cell division, transcription, and translation. These diverse functions are accomplished through interactions of RNA with proteins. However, protein–RNA interactions are still poorly derstood in contrast to protein–protein and protein–DNA interactions. This knowledge gap can be attributed to the limited availability of protein-RNA structures along with the experimental difficulties in studying these complexes. Recent progress in computational resources has expanded the number of tools available for studying protein-RNA interactions at various molecular levels. These include tools for predicting interacting residues from primary sequences, modelling of protein-RNA complexes, predicting hotspots in these complexes and insights into derstanding in the dynamics of their interactions. Each of these tools has its strengths and limitations, which makes it significant to select an optimal approach for the question of interest. Here we present a mini review of computational tools to study different aspects of protein-RNA interactions, with focus on overall application, development of the field and the future perspectives.
Collapse
Affiliation(s)
- Sneha Bheemireddy
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Sankaran Sandhya
- Department of Biotechnology, Faculty of Life and Allied Health Sciences, M.S. Ramaiah University of Applied Sciences, Bengaluru, India
- *Correspondence: Sankaran Sandhya, ; Ramanathan Sowdhamini,
| | | | - Ramanathan Sowdhamini
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- National Centre for Biological Sciences, TIFR, GKVK Campus, Bangalore, India
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
- *Correspondence: Sankaran Sandhya, ; Ramanathan Sowdhamini,
| |
Collapse
|
46
|
Pepe G, Appierdo R, Carrino C, Ballesio F, Helmer-Citterich M, Gherardini PF. Artificial intelligence methods enhance the discovery of RNA interactions. Front Mol Biosci 2022; 9:1000205. [PMID: 36275611 PMCID: PMC9585310 DOI: 10.3389/fmolb.2022.1000205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Understanding how RNAs interact with proteins, RNAs, or other molecules remains a challenge of main interest in biology, given the importance of these complexes in both normal and pathological cellular processes. Since experimental datasets are starting to be available for hundreds of functional interactions between RNAs and other biomolecules, several machine learning and deep learning algorithms have been proposed for predicting RNA-RNA or RNA-protein interactions. However, most of these approaches were evaluated on a single dataset, making performance comparisons difficult. With this review, we aim to summarize recent computational methods, developed in this broad research area, highlighting feature encoding and machine learning strategies adopted. Given the magnitude of the effect that dataset size and quality have on performance, we explored the characteristics of these datasets. Additionally, we discuss multiple approaches to generate datasets of negative examples for training. Finally, we describe the best-performing methods to predict interactions between proteins and specific classes of RNA molecules, such as circular RNAs (circRNAs) and long non-coding RNAs (lncRNAs), and methods to predict RNA-RNA or RNA-RBP interactions independently of the RNA type.
Collapse
Affiliation(s)
- G Pepe
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - R Appierdo
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - C Carrino
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - F Ballesio
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - M Helmer-Citterich
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - PF Gherardini
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| |
Collapse
|
47
|
Protein-Specific Prediction of RNA-Binding Sites Based on Information Entropy. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:8626628. [PMID: 36225547 PMCID: PMC9550406 DOI: 10.1155/2022/8626628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 09/15/2022] [Accepted: 09/20/2022] [Indexed: 11/25/2022]
Abstract
Understanding the protein-RNA interaction mechanism can help us to further explore various biological processes. The experimental techniques still have some limitations, such as the high cost of economy and time. Predicting protein-RNA-binding sites by using computational methods is an excellent research tool. Here, we developed a universal method for predicting protein-specific RNA-binding sites, so one general model for a given protein was constructed on a fixed dataset by fusing the data of different experimental techniques. At the same time, information theory was employed to characterize the sequence conservation of RNA-binding segments. Conversation difference profiles between binding and nonbinding segments were constructed by information entropy (IE), which indicates a significant difference. Finally, the 19 proteins-specific models based on random forest (RF) were built based on IE encoding. The performance on the independent datasets demonstrates that our method can obtain competitive results when compared with the current best prediction model.
Collapse
|
48
|
JLCRB: A unified multi-view-based joint representation learning for CircRNA binding sites prediction. J Biomed Inform 2022; 136:104231. [DOI: 10.1016/j.jbi.2022.104231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 10/14/2022] [Accepted: 10/14/2022] [Indexed: 11/07/2022]
|
49
|
Laverty KU, Jolma A, Pour SE, Zheng H, Ray D, Morris Q, Hughes TR. PRIESSTESS: interpretable, high-performing models of the sequence and structure preferences of RNA-binding proteins. Nucleic Acids Res 2022; 50:e111. [PMID: 36018788 DOI: 10.1093/nar/gkac694] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 07/22/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022] Open
Abstract
Modelling both primary sequence and secondary structure preferences for RNA binding proteins (RBPs) remains an ongoing challenge. Current models use varied RNA structure representations and can be difficult to interpret and evaluate. To address these issues, we present a universal RNA motif-finding/scanning strategy, termed PRIESSTESS (Predictive RBP-RNA InterpretablE Sequence-Structure moTif regrESSion), that can be applied to diverse RNA binding datasets. PRIESSTESS identifies dozens of enriched RNA sequence and/or structure motifs that are subsequently reduced to a set of core motifs by logistic regression with LASSO regularization. Importantly, these core motifs are easily visualized and interpreted, and provide a measure of RBP secondary structure specificity. We used PRIESSTESS to interrogate new HTR-SELEX data for 23 RBPs with diverse RNA binding modes and captured known primary sequence and secondary structure preferences for each. Moreover, when applying PRIESSTESS to 144 RBPs across 202 RNA binding datasets, 75% showed an RNA secondary structure preference but only 10% had a preference besides unpaired bases, suggesting that most RBPs simply recognize the accessibility of primary sequences.
Collapse
Affiliation(s)
- Kaitlin U Laverty
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Arttu Jolma
- Department of Molecular Genetics, University of Toronto, Toronto, Canada.,Donnelly Centre, University of Toronto, Toronto, Canada
| | - Sara E Pour
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Hong Zheng
- Donnelly Centre, University of Toronto, Toronto, Canada
| | - Debashish Ray
- Donnelly Centre, University of Toronto, Toronto, Canada
| | - Quaid Morris
- Department of Molecular Genetics, University of Toronto, Toronto, Canada.,Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, Canada.,Donnelly Centre, University of Toronto, Toronto, Canada
| |
Collapse
|
50
|
Sadée C, Hagler LD, Becker WR, Jarmoskaite I, Vaidyanathan PP, Denny SK, Greenleaf WJ, Herschlag D. A comprehensive thermodynamic model for RNA binding by the Saccharomyces cerevisiae Pumilio protein PUF4. Nat Commun 2022; 13:4522. [PMID: 35927243 PMCID: PMC9352680 DOI: 10.1038/s41467-022-31968-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 07/07/2022] [Indexed: 11/12/2022] Open
Abstract
Genomic methods have been valuable for identifying RNA-binding proteins (RBPs) and the genes, pathways, and processes they regulate. Nevertheless, standard motif descriptions cannot be used to predict all RNA targets or test quantitative models for cellular interactions and regulation. We present a complete thermodynamic model for RNA binding to the S. cerevisiae Pumilio protein PUF4 derived from direct binding data for 6180 RNAs measured using the RNA on a massively parallel array (RNA-MaP) platform. The PUF4 model is highly similar to that of the related RBPs, human PUM2 and PUM1, with one marked exception: a single favorable site of base flipping for PUF4, such that PUF4 preferentially binds to a non-contiguous series of residues. These results are foundational for developing and testing cellular models of RNA-RBP interactions and function, for engineering RBPs, for understanding the biophysical nature of RBP binding and the evolutionary landscape of RNAs and RBPs.
Collapse
Affiliation(s)
- Christoph Sadée
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | - Lauren D Hagler
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | - Winston R Becker
- Biophysics Program, Stanford University School of Medicine, Stanford, CA, USA
| | - Inga Jarmoskaite
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Pavanapuresan P Vaidyanathan
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Protillion Biosciences, Burlingame, CA, USA
| | - Sarah K Denny
- Biophysics Program, Stanford University School of Medicine, Stanford, CA, USA
- Scribe Therapeutics, Alameda, CA, USA
| | - William J Greenleaf
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Department of Applied Physics, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Daniel Herschlag
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Chemical Engineering, Stanford University, Stanford, CA, USA.
- ChEM-H Institute, Stanford University, Stanford, CA, USA.
| |
Collapse
|