1
|
Yi H, Zhang S, Swinderman J, Wang Y, Kanakaveti V, Hung KL, Wong ITL, Srinivasan S, Curtis EJ, Bhargava-Shah A, Li R, Jones MG, Luebeck J, Zhao Y, Belk JA, Kraft K, Shi Q, Yan X, Pritchard SK, Liang FM, Felsher DW, Gilbert LA, Bafna V, Mischel PS, Chang HY. EcDNA-borne PVT1 fusion stabilizes oncogenic mRNAs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.04.01.646515. [PMID: 40236070 PMCID: PMC11996508 DOI: 10.1101/2025.04.01.646515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
Extrachromosomal DNA (ecDNA) amplifications are prevalent drivers of human cancers. We show that ecDNAs exhibit elevated structural variants leading to gene fusions that produce oncogene fusion transcripts. The long noncoding RNA (lncRNA) gene PVT1 is the most recurrent structural variant across cancer genomes, with PVT1-MYC fusions arising most frequently on ecDNA. PVT1 exon 1 is the predominant 5' partner fused to MYC or other oncogenes on the 3' end. Mechanistic studies demonstrate that PVT1 exon 1 confers enhanced RNA stability for fusion transcripts, which requires PVT1 exon 1 interaction with SRSF1 protein. Genetic rescue of MYC-addicted cancer models and isoform-specific single-cell RNA sequencing of tumors reveal that PVT1-MYC better supports MYC dependency and better activates MYC target genes in vivo . Thus, the mutagenic landscape of ecDNA contributes to genome instability and generates chimeric fusions of lncRNA and mRNA genes, selecting PVT1 5' region as a stabilizer of oncogene mRNAs.
Collapse
|
2
|
You Q, Liu J, Zhang R, Wang Z, Zhang B, Guo W, Xu N, Bottillo I, Shao L. Splicing Analysis of Exonic TSC1 and TSC2 Gene Variants Causing Tuberous Sclerosis Complex. Hum Mutat 2025; 2025:1497712. [PMID: 40226305 PMCID: PMC11978479 DOI: 10.1155/humu/1497712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 03/17/2025] [Indexed: 04/15/2025]
Abstract
Tuberous sclerosis complex (TSC) is characterized by abnormalities in cell proliferation and migration, leading to the development of hamartomas, benign tumors, or malignant cancers, affecting both the skin and brain, as well as potentially impacting the heart, kidneys, lungs, and eyes, with varying patterns of involvement over a lifetime. It is primarily caused by mutations in the TSC1 and TSC2 genes. Aberrant splicing is a crucial factor in hereditary diseases. Alternative splicing is a key mechanism for expanding the diversity of the human proteome. Mutations disrupting canonical splice sites or splicing regulatory elements impede the utilization of splice sites, leading to exon skipping and intron retention. We comprehensively analyzed missense and nonsense mutations of TSC1 and TSC2 genes using bioinformatics tools and identified 10 candidate mutations affecting pre-mRNA splicing through minigene analysis. Mutations in TSC genes can lead to partial or complete exon skipping and/or intron retention through complex mechanisms. This study emphasizes the importance of evaluating their roles in the splicing of suspected pathogenic variants in TSC.
Collapse
Affiliation(s)
- Qingqing You
- Department of Nephrology, Qingdao Municipal Hospital (Group), Qingdao Hospital of University of Health and Rehabilitation Sciences, Qingdao, China
| | - Jingwei Liu
- Department of Cardiac Surgery, Qingdao Municipal Hospital (Group), Qingdao Hospital of University of Health and Rehabilitation Sciences, Qingdao, China
| | - Ran Zhang
- Department of Nephrology, Qingdao Municipal Hospital (Group), Qingdao Hospital of University of Health and Rehabilitation Sciences, Qingdao, China
| | - Zhi Wang
- School of Clinical Medicine, Shandong Second Medical University, Weifang, China
| | - Bingying Zhang
- School of Clinical Medicine, Shandong Second Medical University, Weifang, China
| | - Wencong Guo
- Institute of Nephrology, Zhongda Hospital, Southeast University School of Medicine, Nanjing, China
| | - Ning Xu
- Department of Nephrology, Qingdao Municipal Hospital (Group), Qingdao Hospital of University of Health and Rehabilitation Sciences, Qingdao, China
| | - Irene Bottillo
- Division of Medical Genetics, Department of Experimental Medicine, San Camillo-Forlanini Hospital, Sapienza University, Rome, Italy
| | - Leping Shao
- Department of Nephrology, (Fujian Provincial Clinical Research Center for Glomerular Nephritis), The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, China
| |
Collapse
|
3
|
Pan X, Fang Y, Liu X, Guo X, Shen HB. RBPsuite 2.0: an updated RNA-protein binding site prediction suite with high coverage on species and proteins based on deep learning. BMC Biol 2025; 23:74. [PMID: 40069726 PMCID: PMC11899677 DOI: 10.1186/s12915-025-02182-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 03/03/2025] [Indexed: 03/14/2025] Open
Abstract
BACKGROUND RNA-binding proteins (RBPs) play crucial roles in many biological processes, and computationally identifying RNA-RBP interactions provides insights into the biological mechanism of diseases associated with RBPs. RESULTS To make the RBP-specific deep learning-based RBP binding sites prediction methods easily accessible, we developed an updated easy-to-use webserver, RBPsuite 2.0, with an updated web interface for predicting RBP binding sites from linear and circular RNA sequences. RBPsuite 2.0 has a higher coverage on the number of supported RBPs and species compared to the original RBPsuite, supporting an increased number of RBPs from 154 to 353 and expanding the supported species from one to seven. Additionally, RBPsuite 2.0 replaces the CRIP built into RBPsuite 1.0 with iDeepC, a more accurate RBP binding site predictor for circular RNAs. Furthermore, RBPsuite 2.0 estimates the contribution score of individual nucleotides on the input sequences as potential binding motifs and links to the UCSC browser track for better visualization of the prediction results. CONCLUSIONS RBPsuite 2.0 is an updated, more comprehensive webserver for predicting RBP binding sites in both linear and circular RNA sequences. It supports more RBPs and species and provides more accurate predictions for circular RNAs. The tool is freely available at http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/ .
Collapse
Affiliation(s)
- Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| | - Yi Fang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Xiaojian Liu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Xiaoyu Guo
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| |
Collapse
|
4
|
Shen X, Hou Y, Wang X, Zhang C, Liu J, Shen H, Wang W, Yang Y, Yang M, Li Y, Zhang J, Sun Y, Chen K, Shi L, Li X. A deep learning model for characterizing protein-RNA interactions from sequences at single-base resolution. PATTERNS (NEW YORK, N.Y.) 2025; 6:101150. [PMID: 39896261 PMCID: PMC11783876 DOI: 10.1016/j.patter.2024.101150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 09/18/2024] [Accepted: 12/11/2024] [Indexed: 02/04/2025]
Abstract
Protein-RNA interactions play pivotal roles in regulating transcription, translation, and RNA metabolism. Characterizing these interactions offers key insights into RNA dysregulation mechanisms. Here, we introduce Reformer, a deep learning model that predicts protein-RNA binding affinity from sequence data. Trained on 225 enhanced cross-linking and immunoprecipitation sequencing (eCLIP-seq) datasets encompassing 155 RNA-binding proteins across three cell lines, Reformer achieves high accuracy in predicting binding affinity at single-base resolution. The model uncovers binding motifs that are often undetectable through traditional eCLIP-seq methods. Notably, the motifs learned by Reformer are shown to correlate with RNA processing functions. Validation via electrophoretic mobility shift assays confirms the model's precision in quantifying the impact of mutations on RNA regulation. In summary, Reformer improves the resolution of RNA-protein interaction predictions and aids in prioritizing mutations that influence RNA regulation.
Collapse
Affiliation(s)
- Xilin Shen
- Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
- Department of Pathology, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
- State Key Laboratory of Experimental Hematology, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Breast Cancer Prevention and Therapy (Ministry of Education), Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Yayan Hou
- Department of Pharmacy, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
- State Key Laboratory of Experimental Hematology, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Breast Cancer Prevention and Therapy (Ministry of Education), Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Xueer Wang
- The Third Department of Breast Cancer, Tianjin’s Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin 300070, China
| | - Chunyong Zhang
- State Key Laboratory of Experimental Hematology, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Breast Cancer Prevention and Therapy (Ministry of Education), Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Jilei Liu
- Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Hongru Shen
- Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Wei Wang
- Department of Epidemiology and Biostatistics, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Yichen Yang
- Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Meng Yang
- Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Yang Li
- Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Jin Zhang
- The Third Department of Breast Cancer, Tianjin’s Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin 300070, China
| | - Yan Sun
- Department of Pathology, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Kexin Chen
- Department of Epidemiology and Biostatistics, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Lei Shi
- State Key Laboratory of Experimental Hematology, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Breast Cancer Prevention and Therapy (Ministry of Education), Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Xiangchun Li
- Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| |
Collapse
|
5
|
Harini K, Sekijima M, Gromiha MM. Bioinformatics Approaches for Understanding the Binding Affinity of Protein-Nucleic Acid Complexes. Methods Mol Biol 2025; 2867:315-330. [PMID: 39576589 DOI: 10.1007/978-1-0716-4196-5_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Protein-nucleic acid interactions are involved in various biological processes such as gene expression, replication, transcription, translation, and packaging. Understanding the recognition mechanism of the protein-nucleic acid complexes has been investigated from different perspectives, including the binding affinities of protein-DNA and protein-RNA complexes. Experimentally, protein-nucleic acid interactions are analyzed using X-ray crystallography, Isothermal Titration Calorimetry (ITC), DNA/RNA pull-down assays, DNA/RNA footprinting, and systematic evolution of ligands by exponential enrichment (SELEX). On the other hand, numerous databases and computational tools have been developed to study protein-nucleic acid complexes based on their binding sites, specific interactions between them, and binding affinity. In this chapter, we discuss various databases for protein-nucleic acid complex structures and the tools available to extract features from them. Further, we provide details on databases and prediction methods reported for exploring the binding affinity of protein-nucleic acid complexes along with important structure-based parameters, which govern the binding affinity.
Collapse
Affiliation(s)
- K Harini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India
| | - Masakazu Sekijima
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Japan
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India.
- International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama, Japan.
| |
Collapse
|
6
|
Qiao Y, Yang R, Liu Y, Chen J, Zhao L, Huo P, Wang Z, Bu D, Wu Y, Zhao Y. DeepFusion: A deep bimodal information fusion network for unraveling protein-RNA interactions using in vivo RNA structures. Comput Struct Biotechnol J 2024; 23:617-625. [PMID: 38274994 PMCID: PMC10808905 DOI: 10.1016/j.csbj.2023.12.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/04/2023] [Accepted: 12/26/2023] [Indexed: 01/27/2024] Open
Abstract
RNA-binding proteins (RBPs) are key post-transcriptional regulators, and the malfunctions of RBP-RNA binding lead to diverse human diseases. However, prediction of RBP binding sites is largely based on RNA sequence features, whereas in vivo RNA structural features based on high-throughput sequencing are rarely incorporated. Here, we designed a deep bimodal information fusion network called DeepFusion for unraveling protein-RNA interactions by incorporating structural features derived from DMS-seq data. DeepFusion integrates two sub-models to extract local motif-like information and long-term context information. We show that DeepFusion performs best compared with other cutting-edge methods with only sequence inputs on two datasets. DeepFusion's performance is further improved with bimodal input after adding in vivo DMS-seq structural features. Furthermore, DeepFusion can be used for analyzing RNA degradation, demonstrating significantly different RBP-binding scores in genes with slow degradation rates versus those with rapid degradation rates. DeepFusion thus provides enhanced abilities for further analysis of functional RNAs. DeepFusion's code and data are available at http://bioinfo.org/deepfusion/.
Collapse
Affiliation(s)
- Yixuan Qiao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Rui Yang
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yang Liu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiaxin Chen
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Lianhe Zhao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Peipei Huo
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Zhihao Wang
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Dechao Bu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Yang Wu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Yi Zhao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
7
|
Krautwurst S, Lamkiewicz K. RNA-protein interaction prediction without high-throughput data: An overview and benchmark of in silico tools. Comput Struct Biotechnol J 2024; 23:4036-4046. [PMID: 39610906 PMCID: PMC11603007 DOI: 10.1016/j.csbj.2024.11.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 11/05/2024] [Accepted: 11/05/2024] [Indexed: 11/30/2024] Open
Abstract
RNA-protein interactions (RPIs) are crucial for accurately operating various processes in and between organisms across kingdoms of life. Mutual detection of RPI partner molecules depends on distinct sequential, structural, or thermodynamic features, which can be determined via experimental and bioinformatic methods. Still, the underlying molecular mechanisms of many RPIs are poorly understood. It is further hypothesized that many RPIs are not even described yet. Computational RPI prediction is continuously challenged by the lack of data and detailed research of very specific examples. With the discovery of novel RPI complexes in all kingdoms of life, adaptations of existing RPI prediction methods are necessary. Continuously improving computational RPI prediction is key in advancing the understanding of RPIs in detail and supplementing experimental RPI determination. The growing amount of data covering more species and detailed mechanisms support the accuracy of prediction tools, which in turn support specific experimental research on RPIs. Here, we give an overview of RPI prediction tools that do not use high-throughput data as the user's input. We review the tools according to their input, usability, and output. We then apply the tools to known RPI examples across different kingdoms of life. Our comparison shows that the investigated prediction tools do not favor a certain species and equip the user with results varying in degree of information, from an overall RPI score to detailed interacting residues. Furthermore, we provide a guide tree to assist users which RPI prediction tool is appropriate for their available input data and desired output.
Collapse
Affiliation(s)
- Sarah Krautwurst
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| | - Kevin Lamkiewicz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstr. 4, 04103 Leipzig, Germany
| |
Collapse
|
8
|
Wang Z, Wojciechowicz M, Rosen J, Elmas A, Song WM, Liu Y, Huang KL. Master regulators governing protein abundance across ten human cancer types. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.11.619147. [PMID: 39605415 PMCID: PMC11601414 DOI: 10.1101/2024.11.11.619147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Protein abundance correlates only moderately with mRNA levels, and are modulated post-transcriptionally by a network of regulators including ribosomes, RNA-binding proteins (RBPs), and the proteasome. Here, we identified Master Protein abundance Regulators (MaPRs) across ten cancer types by devising a new computational pipeline that jointly analyzed transcriptomes and proteomes from 1,305 tumor samples. We identified 232 to 1,394 MaPRs per cancer type, mediating up to 79% of post-transcriptional regulatory networks. MaPRs exhibit high network connectivity, strong genetic dependency in cancer cells, and significant enrichment for RBPs. Combining tumor up-regulation, druggability, and target network analyses identified cancer-specific vulnerabilities. MaPRs predict tumor proteomic subtypes more accurately than other proteins. Finally, significant portions of RBP MaPR-target relationships were validated by experimental evidence from eCLIP binding and knockdown assays. Our findings uncover central MaPRs that govern post-transcriptional networks, highlighting diverse processes underlying human proteome regulation and identifying key regulators in cancer biology.
Collapse
Affiliation(s)
- Zishan Wang
- Department of Genetics and Genomic Sciences, Department of Artificial Intelligence and Human Health, Center for Transformative Disease Modeling, Tisch Cancer Institute, Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Megan Wojciechowicz
- Department of Genetics and Genomic Sciences, Department of Artificial Intelligence and Human Health, Center for Transformative Disease Modeling, Tisch Cancer Institute, Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Jordan Rosen
- Department of Genetics and Genomic Sciences, Department of Artificial Intelligence and Human Health, Center for Transformative Disease Modeling, Tisch Cancer Institute, Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Abdulkadir Elmas
- Department of Genetics and Genomic Sciences, Department of Artificial Intelligence and Human Health, Center for Transformative Disease Modeling, Tisch Cancer Institute, Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Won-Min Song
- Department of Genetics and Genomic Sciences, Department of Artificial Intelligence and Human Health, Center for Transformative Disease Modeling, Tisch Cancer Institute, Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Yansheng Liu
- Yale Cancer Biology Institute, Yale University, West Haven, CT 06516, USA
- Department of Pharmacology, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Kuan-lin Huang
- Department of Genetics and Genomic Sciences, Department of Artificial Intelligence and Human Health, Center for Transformative Disease Modeling, Tisch Cancer Institute, Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| |
Collapse
|
9
|
Llinares-Burguet I, Sanoguera-Miralles L, Valenzuela-Palomo A, García-Álvarez A, Bueno-Martínez E, Velasco-Sampedro EA. Splicing Dysregulation of Non-Canonical GC-5' Splice Sites of Breast Cancer Susceptibility Genes ATM and PALB2. Cancers (Basel) 2024; 16:3562. [PMID: 39518003 PMCID: PMC11545216 DOI: 10.3390/cancers16213562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 10/07/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024] Open
Abstract
Background/Objectives: The non-canonical GC-5' splice sites (5'ss) are the most common exception (~1%) to the classical GT/AG splicing rule. They constitute weak 5'ss and can be regulated by splicing factors, so they are especially sensitive to genetic variations inducing the misrecognition of their respective exons. We aimed to investigate the GC-5'ss of the breast/ovarian cancer susceptibility genes, ATM (exon 50), BRIP1 (exon 1), and PALB2 (exon 12), and their dysregulation induced by DNA variants. Methods: Splicing assays of the minigenes, mgATM_49-52, mgBRIP1_1-2, and mgPALB2_5-12, were conducted to study the regulation of the indicated GC-5'ss. Results: A functional map of the splicing regulatory elements (SRE) formed by overlapping exonic microdeletions revealed three essential intervals, ATM c.7335_7344del, PALB2 c.3229_3258del, and c.3293_3322del, which are likely targets for spliceogenic SRE-variants. We then selected 14 ATM and 9 PALB2 variants (Hexplorer score < -40) located at these intervals that were assayed in MCF-7 cells. Nine ATM and three PALB2 variants affected splicing, impairing the recognition of exons 50 and 12, respectively. Therefore, these variants likely disrupt the active SREs involved in the inclusion of both exons in the mature mRNA. DeepCLIP predictions suggested the participation of several splicing factors in exon recognition, including SRSF1, SRSF2, and SRSF7, involved in the recognition of other GC sites. The ATM spliceogenic variants c.7336G>T (p.(Glu2446Ter)) and c.7340T>A (p.(Leu2447Ter)) produced significant amounts of full-length transcripts (55-59%), which include premature termination stop codons, so they would inactivate ATM through both splicing disruption and protein truncation mechanisms. Conclusions: ATM exon 50 and PALB2 exon 12 require specific sequences for efficient recognition by the splicing machinery. The mapping of SRE-rich intervals in minigenes is a valuable approach for the identification of spliceogenic variants that outperforms any prediction software. Indeed, 12 spliceogenic SRE-variants were identified in the critical intervals.
Collapse
Affiliation(s)
| | | | | | | | | | - Eladio A. Velasco-Sampedro
- Splicing and Genetic Susceptibility to Cancer, Unidad de Excelencia Instituto de Biomedicina y Genética Molecular (IBGM) de Valladolid, Consejo Superior de Investigaciones Científicas-Universidad de Valladolid (CSIC-UVa), 47003 Valladolid, Spain; (I.L.-B.); (L.S.-M.); (A.V.-P.); (A.G.-Á.); (E.B.-M.)
| |
Collapse
|
10
|
Li M, Fang Q, Xiao P, Yin Z, Mei G, Wang C, Xiang Y, Zhao X, Qu L, Xu T, Zhang J, Liu K, Li X, Dong H, Xiao R, Zhou R. KHSRP ameliorates acute liver failure by regulating pre-mRNA splicing through its interaction with SF3B1. Cell Death Dis 2024; 15:618. [PMID: 39187547 PMCID: PMC11347664 DOI: 10.1038/s41419-024-06886-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Revised: 06/27/2024] [Accepted: 07/02/2024] [Indexed: 08/28/2024]
Abstract
Acute liver failure (ALF) is characterized by the rapidly progressive deterioration of hepatic function, which, without effective medical intervention, results in high mortality and morbidity. Here, using proteomic and transcriptomic analyses in murine ALF models, we found that the expression of multiple splicing factors was downregulated in ALF. Notably, we found that KH-type splicing regulatory protein (KHSRP) has a protective effect in ALF. Knockdown of KHSRP resulted in dramatic splicing defects, such as intron retention, and led to the exacerbation of liver injury in ALF. Moreover, we demonstrated that KHSRP directly interacts with splicing factor 3b subunit 1 (SF3B1) and enhances the binding of SF3B1 to the intronic branch sites, thereby promoting pre-mRNA splicing. Using splicing inhibitors, we found that Khsrp protects against ALF by regulating pre-mRNA splicing in vivo. Overall, our findings demonstrate that KHSRP is an important splicing activator and promotes the expression of genes associated with ALF progression by interacting with SF3B1; thus, KHSRP could be a possible target for therapeutic intervention in ALF.
Collapse
Affiliation(s)
- Mingxuan Li
- Hubei Province Key Laboratory of Allergy and Immunology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
- Department of Medical Parasitology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
| | - Qian Fang
- Hubei Province Key Laboratory of Allergy and Immunology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
- Department of Medical Parasitology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
| | - Pingping Xiao
- Hubei Province Key Laboratory of Allergy and Immunology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
- Department of Medical Parasitology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
- School of Basic Medical Sciences, Hubei University of Medicine, Shiyan, Hubei, 442000, China
| | - Zhinang Yin
- Department of Pathophysiology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
| | - Guangbo Mei
- Hubei Province Key Laboratory of Allergy and Immunology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
- Department of Medical Parasitology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
| | - Cheng Wang
- Hubei Province Key Laboratory of Allergy and Immunology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
| | - Ying Xiang
- Department of Pathophysiology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
| | - Xuejun Zhao
- Hubei Province Key Laboratory of Allergy and Immunology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
- Department of Medical Parasitology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
| | - Lihua Qu
- Hubei Province Key Laboratory of Allergy and Immunology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
- Department of Medical Parasitology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
- School of Basic Medical Sciences, Xianning Medical College, Hubei University of Science and Technology, Xianning, Hubei, 437000, China
| | - Tian Xu
- Hubei Province Key Laboratory of Allergy and Immunology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
| | - Jiaxi Zhang
- Hubei Province Key Laboratory of Allergy and Immunology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
- Department of Medical Parasitology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
| | - Kejun Liu
- Hubei Province Key Laboratory of Allergy and Immunology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
- Department of Medical Parasitology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
| | - Xiaoqing Li
- Center for Stem Cell Research and Application, Union Hospital, Tongji Medical School, Huazhong University of Science and Technology, Wuhan, Hubei, 430022, China
| | - Huifen Dong
- Hubei Province Key Laboratory of Allergy and Immunology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
- Department of Medical Parasitology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China
| | - Ruijing Xiao
- Hubei Province Key Laboratory of Allergy and Immunology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China.
- Department of Pathophysiology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China.
| | - Rui Zhou
- Hubei Province Key Laboratory of Allergy and Immunology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China.
- Department of Medical Parasitology, School of Basic Medical Sciences, Wuhan University, Wuhan, Hubei, 430071, China.
| |
Collapse
|
11
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
12
|
Wilson B, Esmaeili F, Parsons M, Salah W, Su Z, Dutta A. sRNA-Effector: A tool to expedite discovery of small RNA regulators. iScience 2024; 27:109300. [PMID: 38469560 PMCID: PMC10926228 DOI: 10.1016/j.isci.2024.109300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 11/08/2023] [Accepted: 02/16/2024] [Indexed: 03/13/2024] Open
Abstract
microRNAs (miRNAs) are small regulatory RNAs that repress target mRNA transcripts through base pairing. Although the mechanisms of miRNA production and function are clearly established, new insights into miRNA regulation or miRNA-mediated gene silencing are still emerging. In order to facilitate the discovery of miRNA regulators or effectors, we have developed sRNA-Effector, a machine learning algorithm trained on enhanced crosslinking and immunoprecipitation sequencing and RNA sequencing data following knockdown of specific genes. sRNA-Effector can accurately identify known miRNA biogenesis and effector proteins and identifies 9 putative regulators of miRNA function, including serine/threonine kinase STK33, splicing factor SFPQ, and proto-oncogene BMI1. We validated the role of STK33, SFPQ, and BMI1 in miRNA regulation, showing that sRNA-Effector is useful for identifying new players in small RNA biology. sRNA-Effector will be a web tool available for all researchers to identify potential miRNA regulators in any cell line of interest.
Collapse
Affiliation(s)
- Briana Wilson
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA 22901, USA
| | - Fatemeh Esmaeili
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL 35233, USA
| | - Matthew Parsons
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA 22901, USA
| | - Wafa Salah
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA 22901, USA
| | - Zhangli Su
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL 35233, USA
| | - Anindya Dutta
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL 35233, USA
| |
Collapse
|
13
|
Spangsberg Petersen US, Dembic M, Martínez-Pizarro A, Richard E, Holm LL, Havelund JF, Doktor TK, Larsen MR, Færgeman NJ, Desviat LR, Andresen BS. Regulating PCCA gene expression by modulation of pseudoexon splicing patterns to rescue enzyme activity in propionic acidemia. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102101. [PMID: 38204914 PMCID: PMC10776996 DOI: 10.1016/j.omtn.2023.102101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 12/08/2023] [Indexed: 01/12/2024]
Abstract
Pseudoexons are nonfunctional intronic sequences that can be activated by deep-intronic sequence variation. Activation increases pseudoexon inclusion in mRNA and interferes with normal gene expression. The PCCA c.1285-1416A>G variation activates a pseudoexon and causes the severe metabolic disorder propionic acidemia by deficiency of the propionyl-CoA carboxylase enzyme encoded by PCCA and PCCB. We characterized this pathogenic pseudoexon activation event in detail and identified hnRNP A1 to be important for normal repression. The PCCA c.1285-1416A>G variation disrupts an hnRNP A1-binding splicing silencer and simultaneously creates a splicing enhancer. We demonstrate that blocking this region of regulation with splice-switching antisense oligonucleotides restores normal splicing and rescues enzyme activity in patient fibroblasts and in a cellular model created by CRISPR gene editing. Interestingly, the PCCA pseudoexon offers an unexploited potential to upregulate gene expression because healthy tissues show relatively high inclusion levels. By blocking inclusion of the nonactivated wild-type pseudoexon, we can increase both PCCA and PCCB protein levels, which increases the activity of the heterododecameric enzyme. Surprisingly, we can increase enzyme activity from residual levels in not only patient fibroblasts harboring PCCA missense variants but also those harboring PCCB missense variants. This is a potential treatment strategy for propionic acidemia.
Collapse
Affiliation(s)
- Ulrika Simone Spangsberg Petersen
- Department of Biochemistry and Molecular Biology and the Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Maja Dembic
- Department of Biochemistry and Molecular Biology and the Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
- Department of Clinical Genetics, Odense University Hospital, 5000 Odense C, Denmark
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense M, Denmark
| | - Ainhoa Martínez-Pizarro
- Centro de Biología Molecular Severo Ochoa, UAM-CSIC, CEDEM, CIBERER, IdiPaz, Universidad Autónoma de Madrid, 28049 Madrid, Spain
| | - Eva Richard
- Centro de Biología Molecular Severo Ochoa, UAM-CSIC, CEDEM, CIBERER, IdiPaz, Universidad Autónoma de Madrid, 28049 Madrid, Spain
| | - Lise Lolle Holm
- Department of Biochemistry and Molecular Biology and the Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Jesper Foged Havelund
- Department of Biochemistry and Molecular Biology and the Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Thomas Koed Doktor
- Department of Biochemistry and Molecular Biology and the Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Martin Røssel Larsen
- Department of Biochemistry and Molecular Biology and the Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Nils J. Færgeman
- Department of Biochemistry and Molecular Biology and the Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| | - Lourdes Ruiz Desviat
- Centro de Biología Molecular Severo Ochoa, UAM-CSIC, CEDEM, CIBERER, IdiPaz, Universidad Autónoma de Madrid, 28049 Madrid, Spain
| | - Brage Storstein Andresen
- Department of Biochemistry and Molecular Biology and the Villum Center for Bioanalytical Sciences, University of Southern Denmark, 5230 Odense M, Denmark
| |
Collapse
|
14
|
Shirvanizadeh N, Vihinen M. VariBench, new variation benchmark categories and data sets. FRONTIERS IN BIOINFORMATICS 2023; 3:1248732. [PMID: 37795169 PMCID: PMC10546188 DOI: 10.3389/fbinf.2023.1248732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 09/08/2023] [Indexed: 10/06/2023] Open
Affiliation(s)
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| |
Collapse
|
15
|
Pan Z, Zhou S, Liu T, Liu C, Zang M, Wang Q. WVDL: Weighted Voting Deep Learning Model for Predicting RNA-Protein Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3322-3328. [PMID: 37028092 DOI: 10.1109/tcbb.2023.3252276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
RNA-binding proteins are important for the process of cell life activities. High-throughput technique experimental method to discover RNA-protein binding sites is time-consuming and expensive. Deep learning is an effective theory for predicting RNA-protein binding sites. Using weighted voting method to integrate multiple basic classifier models can improve model performance. Thus, in our study, we propose a weighted voting deep learning model (WVDL), which uses weighted voting method to combine convolutional neural network (CNN), long short term memory network (LSTM) and residual network (ResNet). First, the final forecast result of WVDL outperforms the basic classifier models and other ensemble strategies. Second, WVDL can extract more effective features by using weighted voting to find the best weighted combination. And, the CNN model also can draw the predicted motif pictures. Third, WVDL gets a competitive experiment result on public RBP-24 datasets comparing with other state-of-the-art methods. The source code of our proposed WVDL can be found in https://github.com/biomg/WVDL.
Collapse
|
16
|
Pan Z, Zhou S, Zou H, Liu C, Zang M, Liu T, Wang Q. MCNN: Multiple Convolutional Neural Networks for RNA-Protein Binding Sites Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1180-1187. [PMID: 35471886 DOI: 10.1109/tcbb.2022.3170367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Computational prediction of the RBP bound sites using features learned from existing annotation knowledge is an effective method because high-throughput experiments are complex, expensive and time-consuming. Many methods have been proposed to predict RNA-protein binding sites. However, the partial information of RNA sequence is not fully used. In this study, we propose multiple convolutional neural networks (MCNN) method, which predicts RNA-protein binding sites by integrating multiple convolutional neural networks constructed by RNA sequence information extracted from windows with different lengths. First, MCNN trains multiple CNNs base on RNA sequences extracted by different window lengths. Second, MCNN can extract more binding patterns of RBPs by combining these trained multiple CNNs previously. Third, MCNN only uses RNA base sequence information for RNA-protein binding sites prediction, which extracts sequence binding features and predicts the result with same architecture. This avoids the information loss of feature extraction step. Our proposed MCNN demonstrates a competitive performance comparing with other methods on a large-scale dataset derived from CLIP-seq, which is an effective method for RNA-protein binding sites prediction. The source code of our proposed MCNN method can be found in https://github.com/biomg/MCNN.
Collapse
|
17
|
Rybarczyk A, Lehmann T, Iwańczyk-Skalska E, Juzwa W, Pławski A, Kopciuch K, Blazewicz J, Jagodziński PP. In silico and in vitro analysis of the impact of single substitutions within EXO-motifs on Hsa-MiR-1246 intercellular transfer in breast cancer cell. J Appl Genet 2023; 64:105-124. [PMID: 36394782 PMCID: PMC9837009 DOI: 10.1007/s13353-022-00730-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/26/2022] [Accepted: 09/27/2022] [Indexed: 11/19/2022]
Abstract
MiR-1246 has recently gained much attention and many studies have shown its oncogenic role in colorectal, breast, lung, and ovarian cancers. However, miR-1246 processing, stability, and mechanisms directing miR-1246 into neighbor cells remain still unclear. In this study, we aimed to determine the role of single-nucleotide substitutions within short exosome sorting motifs - so-called EXO-motifs: GGAG and GCAG present in miR-1246 sequence on its intracellular stability and extracellular transfer. We applied in silico methods such as 2D and 3D structure analysis and modeling of protein interactions. We also performed in vitro validation through the transfection of fluorescently labeled miRNA to MDA-MB-231 cells, which we analyzed by flow cytometry and fluorescent microscopy. Our results suggest that nucleotides alterations that disturbed miR-1246 EXO-motifs were able to modulate miRNA-1246 stability and its transfer level to the neighboring cells, suggesting that the molecular mechanism of RNA stability and intercellular transfer can be closely related.
Collapse
Affiliation(s)
- Agnieszka Rybarczyk
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Tomasz Lehmann
- Department of Biochemistry and Molecular Biology, Poznan University of Medical Sciences, Fredry 10, 61-701 Poznan, Poland
| | - Ewa Iwańczyk-Skalska
- Department of Biochemistry and Molecular Biology, Poznan University of Medical Sciences, Fredry 10, 61-701 Poznan, Poland
| | - Wojciech Juzwa
- Biotechnology and Food Microbiology, Poznan University of Life Sciences, Wojska Polskiego 48, 60-627 Poznan, Poland
| | - Andrzej Pławski
- Institute of Human Genetics, Polish Academy of Sciences, Strzeszyńska 32, 60-479 Poznan, Poland
| | - Kamil Kopciuch
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Jacek Blazewicz
- Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Paweł P. Jagodziński
- Department of Biochemistry and Molecular Biology, Poznan University of Medical Sciences, Fredry 10, 61-701 Poznan, Poland
| |
Collapse
|
18
|
Alharbi WS, Rashid M. A review of deep learning applications in human genomics using next-generation sequencing data. Hum Genomics 2022; 16:26. [PMID: 35879805 PMCID: PMC9317091 DOI: 10.1186/s40246-022-00396-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 07/12/2022] [Indexed: 12/02/2022] Open
Abstract
Genomics is advancing towards data-driven science. Through the advent of high-throughput data generating technologies in human genomics, we are overwhelmed with the heap of genomic data. To extract knowledge and pattern out of this genomic data, artificial intelligence especially deep learning methods has been instrumental. In the current review, we address development and application of deep learning methods/models in different subarea of human genomics. We assessed over- and under-charted area of genomics by deep learning techniques. Deep learning algorithms underlying the genomic tools have been discussed briefly in later part of this review. Finally, we discussed briefly about the late application of deep learning tools in genomic. Conclusively, this review is timely for biotechnology or genomic scientists in order to guide them why, when and how to use deep learning methods to analyse human genomic data.
Collapse
Affiliation(s)
- Wardah S Alharbi
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia
| | - Mamoon Rashid
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia.
| |
Collapse
|
19
|
Lal A, Galvao Ferrarini M, Gruber AJ. Investigating the Human Host-ssRNA Virus Interaction Landscape Using the SMEAGOL Toolbox. Viruses 2022; 14:1436. [PMID: 35891416 PMCID: PMC9317827 DOI: 10.3390/v14071436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/19/2022] [Accepted: 06/24/2022] [Indexed: 12/04/2022] Open
Abstract
Viruses have evolved numerous mechanisms to exploit the molecular machinery of their host cells, including the broad spectrum of host RNA-binding proteins (RBPs). However, the RBP interactomes of most viruses are largely unknown. To shed light on the interaction landscape of RNA viruses with human host cell RBPs, we have analysed 197 single-stranded RNA (ssRNA) viral genome sequences and found that the majority of ssRNA virus genomes are significantly enriched or depleted in motifs for specific human RBPs, suggesting selection pressure on these interactions. To facilitate tailored investigations and the analysis of genomes sequenced in future, we have released our methodology as a fast and user-friendly computational toolbox named SMEAGOL. Our resources will contribute to future studies of specific ssRNA virus-host cell interactions and support the identification of antiviral drug targets.
Collapse
Affiliation(s)
| | - Mariana Galvao Ferrarini
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France;
- Laboratoire de Biométrie et Biologie Évolutive, UMR 5558, CNRS, Université de Lyon, Université Lyon 1, 69622 Villeurbanne, France
| | - Andreas J. Gruber
- Department of Biology, University of Konstanz, Universitaetsstrasse 10, D-78464 Konstanz, Germany
| |
Collapse
|
20
|
Chalupová E, Vaculík O, Poláček J, Jozefov F, Majtner T, Alexiou P. ENNGene: an Easy Neural Network model building tool for Genomics. BMC Genomics 2022; 23:248. [PMID: 35361122 PMCID: PMC8973509 DOI: 10.1186/s12864-022-08414-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 02/23/2022] [Indexed: 11/17/2022] Open
Abstract
Background The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. Results Here we present ENNGene—Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. Conclusions As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08414-x.
Collapse
Affiliation(s)
- Eliška Chalupová
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia.,Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Ondřej Vaculík
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia.,Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Jakub Poláček
- Faculty of Informatics, Masaryk University, Brno, Czechia
| | - Filip Jozefov
- Faculty of Informatics, Masaryk University, Brno, Czechia
| | - Tomáš Majtner
- Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Panagiotis Alexiou
- Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia.
| |
Collapse
|
21
|
RBPSpot: Learning on appropriate contextual information for RBP binding sites discovery. iScience 2021; 24:103381. [PMID: 34841226 PMCID: PMC8605353 DOI: 10.1016/j.isci.2021.103381] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 09/01/2021] [Accepted: 10/27/2021] [Indexed: 11/29/2022] Open
Abstract
Identifying the factors determining the RBP-RNA interactions remains a big challenge. It involves sparse binding motifs and a suitable sequence context for binding. The present work describes an approach to detect RBP binding sites in RNAs using an ultra-fast inexact k-mers search for statistically significant seeds. The seeds work as an anchor to evaluate the context and binding potential using flanking region information while leveraging from Deep Feed-forward Neural Network. The developed models also received support from MD-simulation studies. The implemented software, RBPSpot, scored consistently high for all the performance metrics including average accuracy of ∼90% across a large number of validated datasets. It outperformed the compared tools, including some with much complex deep-learning models, during a comprehensive benchmarking process. RBPSpot can identify RBP binding sites in the human system and can also be used to develop new models, making it a valuable resource in the area of regulatory system studies. Efficient motif anchoring helps to get good quality contextual information on binding Realistic and high granularity datasets ensure better performance of the classifiers DNN models on the contextual features outperform more complex deep learning tools RBPSpot algorithm may be used to develop RBP binding models for other species also
Collapse
|
22
|
Grønning AGB, Kacprowski T, Schéele C. MultiPep: a hierarchical deep learning approach for multi-label classification of peptide bioactivities. Biol Methods Protoc 2021; 6:bpab021. [PMID: 34909478 PMCID: PMC8665375 DOI: 10.1093/biomethods/bpab021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/28/2021] [Accepted: 11/17/2021] [Indexed: 11/14/2022] Open
Abstract
Peptide-based therapeutics are here to stay and will prosper in the future. A key step in identifying novel peptide-drugs is the determination of their bioactivities. Recent advances in peptidomics screening approaches hold promise as a strategy for identifying novel drug targets. However, these screenings typically generate an immense number of peptides and tools for ranking these peptides prior to planning functional studies are warranted. Whereas a couple of tools in the literature predict multiple classes, these are constructed using multiple binary classifiers. We here aimed to use an innovative deep learning approach to generate an improved peptide bioactivity classifier with capacity of distinguishing between multiple classes. We present MultiPep: a deep learning multi-label classifier that assigns peptides to zero or more of 20 bioactivity classes. We train and test MultiPep on data from several publically available databases. The same data are used for a hierarchical clustering, whose dendrogram shapes the architecture of MultiPep. We test a new loss function that combines a customized version of Matthews correlation coefficient with binary cross entropy (BCE), and show that this is better than using class-weighted BCE as loss function. Further, we show that MultiPep surpasses state-of-the-art peptide bioactivity classifiers and that it predicts known and novel bioactivities of FDA-approved therapeutic peptides. In conclusion, we present innovative machine learning techniques used to produce a peptide prediction tool to aid peptide-based therapy development and hypothesis generation.
Collapse
Affiliation(s)
- Alexander G B Grønning
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, 38106 Braunschweig, Germany.,Braunschweig Integrated Centre for Systems Biology (BRICS), 38106 Braunschweig, Germany
| | - Camilla Schéele
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| |
Collapse
|
23
|
Pezoulas VC, Hazapis O, Lagopati N, Exarchos TP, Goules AV, Tzioufas AG, Fotiadis DI, Stratis IG, Yannacopoulos AN, Gorgoulis VG. Machine Learning Approaches on High Throughput NGS Data to Unveil Mechanisms of Function in Biology and Disease. Cancer Genomics Proteomics 2021; 18:605-626. [PMID: 34479914 PMCID: PMC8441762 DOI: 10.21873/cgp.20284] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 07/21/2021] [Accepted: 08/03/2021] [Indexed: 12/13/2022] Open
Abstract
In this review, the fundamental basis of machine learning (ML) and data mining (DM) are summarized together with the techniques for distilling knowledge from state-of-the-art omics experiments. This includes an introduction to the basic mathematical principles of unsupervised/supervised learning methods, dimensionality reduction techniques, deep neural networks architectures and the applications of these in bioinformatics. Several case studies under evaluation mainly involve next generation sequencing (NGS) experiments, like deciphering gene expression from total and single cell (scRNA-seq) analysis; for the latter, a description of all recent artificial intelligence (AI) methods for the investigation of cell sub-types, biomarkers and imputation techniques are described. Other areas of interest where various ML schemes have been investigated are for providing information regarding transcription factors (TF) binding sites, chromatin organization patterns and RNA binding proteins (RBPs), while analyses on RNA sequence and structure as well as 3D dimensional protein structure predictions with the use of ML are described. Furthermore, we summarize the recent methods of using ML in clinical oncology, when taking into consideration the current omics data with pharmacogenomics to determine personalized treatments. With this review we wish to provide the scientific community with a thorough investigation of main novel ML applications which take into consideration the latest achievements in genomics, thus, unraveling the fundamental mechanisms of biology towards the understanding and cure of diseases.
Collapse
Affiliation(s)
- Vasileios C Pezoulas
- Unit of Medical Technology and Intelligent Information Systems, University of Ioannina, Ioannina, Greece
| | - Orsalia Hazapis
- Molecular Carcinogenesis Group, Department of Histology and Embryology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Nefeli Lagopati
- Molecular Carcinogenesis Group, Department of Histology and Embryology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
- Biomedical Research Foundation of the Academy of Athens, Athens, Greece
| | - Themis P Exarchos
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, Greece
- Department of Informatics, Ionian University, Corfu, Greece
| | - Andreas V Goules
- Department of Pathophysiology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Athanasios G Tzioufas
- Department of Pathophysiology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Dimitrios I Fotiadis
- Unit of Medical Technology and Intelligent Information Systems, University of Ioannina, Ioannina, Greece
| | - Ioannis G Stratis
- Department of Mathematics, National and Kapodistrian University of Athens, Athens, Greece
| | - Athanasios N Yannacopoulos
- Department of Statistics, and Stochastic Modelling and Applications Laboratory, Athens University of Economics and Business (AUEB), Athens, Greece;
| | - Vassilis G Gorgoulis
- Molecular Carcinogenesis Group, Department of Histology and Embryology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece;
- Biomedical Research Foundation of the Academy of Athens, Athens, Greece
- Division of Cancer Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester Cancer Research Centre, NIHR Manchester Biomedical Research Centre, University of Manchester, Manchester, U.K
- Center for New Biotechnologies and Precision Medicine, Medical School, National and Kapodistrian University of Athens, Athens, Greece
- Faculty of Health and Medical Sciences, University of Surrey, Surrey, U.K
| |
Collapse
|
24
|
Uhl M, Tran VD, Heyl F, Backofen R. RNAProt: an efficient and feature-rich RNA binding protein binding site predictor. Gigascience 2021; 10:giab054. [PMID: 34406415 PMCID: PMC8372218 DOI: 10.1093/gigascience/giab054] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 05/18/2021] [Accepted: 07/27/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Cross-linking and immunoprecipitation followed by next-generation sequencing (CLIP-seq) is the state-of-the-art technique used to experimentally determine transcriptome-wide binding sites of RNA-binding proteins (RBPs). However, it relies on gene expression, which can be highly variable between conditions and thus cannot provide a complete picture of the RBP binding landscape. This creates a demand for computational methods to predict missing binding sites. Although there exist various methods using traditional machine learning and lately also deep learning, we encountered several problems: many of these are not well documented or maintained, making them difficult to install and use, or are not even available. In addition, there can be efficiency issues, as well as little flexibility regarding options or supported features. RESULTS Here, we present RNAProt, an efficient and feature-rich computational RBP binding site prediction framework based on recurrent neural networks. We compare RNAProt with 1 traditional machine learning approach and 2 deep-learning methods, demonstrating its state-of-the-art predictive performance and better run time efficiency. We further show that its implemented visualizations capture known binding preferences and thus can help to understand what is learned. Since RNAProt supports various additional features (including user-defined features, which no other tool offers), we also present their influence on benchmark set performance. Finally, we show the benefits of incorporating additional features, specifically structure information, when learning the binding sites of an hairpin loop binding RBP. CONCLUSIONS RNAProt provides a complete framework for RBP binding site predictions, from data set generation over model training to the evaluation of binding preferences and prediction. It offers state-of-the-art predictive performance, as well as superior run time efficiency, while at the same time supporting more features and input types than any other tool available so far. RNAProt is easy to install and use, comes with comprehensive documentation, and is accompanied by informative statistics and visualizations. All this makes RNAProt a valuable tool to apply in future RBP binding site research.
Collapse
Affiliation(s)
- Michael Uhl
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Van Dinh Tran
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Florian Heyl
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany
| |
Collapse
|
25
|
Gutman T, Goren G, Efroni O, Tuller T. Estimating the predictive power of silent mutations on cancer classification and prognosis. NPJ Genom Med 2021; 6:67. [PMID: 34385450 PMCID: PMC8361094 DOI: 10.1038/s41525-021-00229-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 06/24/2021] [Indexed: 02/07/2023] Open
Abstract
In recent years it has been shown that silent mutations, in and out of the coding region, can affect gene expression and may be related to tumorigenesis and cancer cell fitness. However, the predictive ability of these mutations for cancer type diagnosis and prognosis has not been evaluated yet. In the current study, based on the analysis of 9,915 cancer genomes and approximately three million mutations, we provide a comprehensive quantitative evaluation of the predictive power of various types of silent and non-silent mutations over cancer classification and prognosis. The results indicate that silent-mutation models outperform the equivalent null models in classifying all examined cancer types and in estimating the probability of survival 10 years after the initial diagnosis. Additionally, combining both non-silent and silent mutations achieved the best classification results for 68% of the cancer types and the best survival estimation results for up to nine years after the diagnosis. Thus, silent mutations hold considerable predictive power over both cancer classification and prognosis, most likely due to their effect on gene expression. It is highly advised that silent mutations are integrated in cancer research in order to unravel the full genomic landscape of cancer and its ramifications on cancer fitness.
Collapse
Affiliation(s)
- Tal Gutman
- Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv University, Tel-Aviv, Israel
| | - Guy Goren
- Department of Electrical Engineering, the Engineering Faculty, Tel Aviv University, Tel-Aviv, Israel
| | - Omri Efroni
- Department of Electrical Engineering, the Engineering Faculty, Tel Aviv University, Tel-Aviv, Israel
| | - Tamir Tuller
- Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv University, Tel-Aviv, Israel.
| |
Collapse
|
26
|
Sohrabi-Jahromi S, Söding J. Thermodynamic modeling reveals widespread multivalent binding by RNA-binding proteins. Bioinformatics 2021; 37:i308-i316. [PMID: 34252974 PMCID: PMC8275352 DOI: 10.1093/bioinformatics/btab300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Understanding how proteins recognize their RNA targets is essential to elucidate regulatory processes in the cell. Many RNA-binding proteins (RBPs) form complexes or have multiple domains that allow them to bind to RNA in a multivalent, cooperative manner. They can thereby achieve higher specificity and affinity than proteins with a single RNA-binding domain. However, current approaches to de novo discovery of RNA binding motifs do not take multivalent binding into account. RESULTS We present Bipartite Motif Finder (BMF), which is based on a thermodynamic model of RBPs with two cooperatively binding RNA-binding domains. We show that bivalent binding is a common strategy among RBPs, yielding higher affinity and sequence specificity. We furthermore illustrate that the spatial geometry between the binding sites can be learned from bound RNA sequences. These discovered bipartite motifs are consistent with previously known motifs and binding behaviors. Our results demonstrate the importance of multivalent binding for RNA-binding proteins and highlight the value of bipartite motif models in representing the multivalency of protein-RNA interactions. AVAILABILITY AND IMPLEMENTATION BMF source code is available at https://github.com/soedinglab/bipartite_motif_finder under a GPL license. The BMF web server is accessible at https://bmf.soedinglab.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Salma Sohrabi-Jahromi
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen 37077, Germany
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen 37077, Germany.,Campus-Institut Data Science (CIDAS), Göttingen 37077, Germany
| |
Collapse
|
27
|
Vedithi SC, Malhotra S, Acebrón-García-de-Eulate M, Matusevicius M, Torres PHM, Blundell TL. Structure-Guided Computational Approaches to Unravel Druggable Proteomic Landscape of Mycobacterium leprae. Front Mol Biosci 2021; 8:663301. [PMID: 34026836 PMCID: PMC8138464 DOI: 10.3389/fmolb.2021.663301] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 04/12/2021] [Indexed: 02/02/2023] Open
Abstract
Leprosy, caused by Mycobacterium leprae (M. leprae), is treated with a multidrug regimen comprising Dapsone, Rifampicin, and Clofazimine. These drugs exhibit bacteriostatic, bactericidal and anti-inflammatory properties, respectively, and control the dissemination of infection in the host. However, the current treatment is not cost-effective, does not favor patient compliance due to its long duration (12 months) and does not protect against the incumbent nerve damage, which is a severe leprosy complication. The chronic infectious peripheral neuropathy associated with the disease is primarily due to the bacterial components infiltrating the Schwann cells that protect neuronal axons, thereby inducing a demyelinating phenotype. There is a need to discover novel/repurposed drugs that can act as short duration and effective alternatives to the existing treatment regimens, preventing nerve damage and consequent disability associated with the disease. Mycobacterium leprae is an obligate pathogen resulting in experimental intractability to cultivate the bacillus in vitro and limiting drug discovery efforts to repositioning screens in mouse footpad models. The dearth of knowledge related to structural proteomics of M. leprae, coupled with emerging antimicrobial resistance to all the three drugs in the multidrug therapy, poses a need for concerted novel drug discovery efforts. A comprehensive understanding of the proteomic landscape of M. leprae is indispensable to unravel druggable targets that are essential for bacterial survival and predilection of human neuronal Schwann cells. Of the 1,614 protein-coding genes in the genome of M. leprae, only 17 protein structures are available in the Protein Data Bank. In this review, we discussed efforts made to model the proteome of M. leprae using a suite of software for protein modeling that has been developed in the Blundell laboratory. Precise template selection by employing sequence-structure homology recognition software, multi-template modeling of the monomeric models and accurate quality assessment are the hallmarks of the modeling process. Tools that map interfaces and enable building of homo-oligomers are discussed in the context of interface stability. Other software is described to determine the druggable proteome by using information related to the chokepoint analysis of the metabolic pathways, gene essentiality, homology to human proteins, functional sites, druggable pockets and fragment hotspot maps.
Collapse
Affiliation(s)
- Sundeep Chaitanya Vedithi
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom,*Correspondence: Sundeep Chaitanya Vedithi,
| | - Sony Malhotra
- Rutherford Appleton Laboratory, Science and Technology Facilities Council, Oxon, United Kingdom
| | | | | | - Pedro Henrique Monteiro Torres
- Laboratório de Modelagem e Dinâmica Molecular, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Tom L. Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom,Tom L. Blundell,
| |
Collapse
|
28
|
Sun L, Xu K, Huang W, Yang YT, Li P, Tang L, Xiong T, Zhang QC. Predicting dynamic cellular protein-RNA interactions by deep learning using in vivo RNA structures. Cell Res 2021; 31:495-516. [PMID: 33623109 PMCID: PMC7900654 DOI: 10.1038/s41422-021-00476-y] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 01/19/2021] [Indexed: 01/31/2023] Open
Abstract
Interactions with RNA-binding proteins (RBPs) are integral to RNA function and cellular regulation, and dynamically reflect specific cellular conditions. However, presently available tools for predicting RBP-RNA interactions employ RNA sequence and/or predicted RNA structures, and therefore do not capture their condition-dependent nature. Here, after profiling transcriptome-wide in vivo RNA secondary structures in seven cell types, we developed PrismNet, a deep learning tool that integrates experimental in vivo RNA structure data and RBP binding data for matched cells to accurately predict dynamic RBP binding in various cellular conditions. PrismNet results for 168 RBPs support its utility for both understanding CLIP-seq results and largely extending such interaction data to accurately analyze additional cell types. Further, PrismNet employs an "attention" strategy to computationally identify exact RBP-binding nucleotides, and we discovered enrichment among dynamic RBP-binding sites for structure-changing variants (riboSNitches), which can link genetic diseases with dysregulated RBP bindings. Our rich profiling data and deep learning-based prediction tool provide access to a previously inaccessible layer of cell-type-specific RBP-RNA interactions, with clear utility for understanding and treating human diseases.
Collapse
Affiliation(s)
- Lei Sun
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Kui Xu
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Wenze Huang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Yucheng T Yang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Pan Li
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Lei Tang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Tuanlin Xiong
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Qiangfeng Cliff Zhang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China.
| |
Collapse
|
29
|
Koo PK, Majdandzic A, Ploenzke M, Anand P, Paul SB. Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks. PLoS Comput Biol 2021; 17:e1008925. [PMID: 33983921 PMCID: PMC8118286 DOI: 10.1371/journal.pcbi.1008925] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 03/30/2021] [Indexed: 12/15/2022] Open
Abstract
Deep neural networks have demonstrated improved performance at predicting the sequence specificities of DNA- and RNA-binding proteins compared to previous methods that rely on k-mers and position weight matrices. To gain insights into why a DNN makes a given prediction, model interpretability methods, such as attribution methods, can be employed to identify motif-like representations along a given sequence. Because explanations are given on an individual sequence basis and can vary substantially across sequences, deducing generalizable trends across the dataset and quantifying their effect size remains a challenge. Here we introduce global importance analysis (GIA), a model interpretability method that quantifies the population-level effect size that putative patterns have on model predictions. GIA provides an avenue to quantitatively test hypotheses of putative patterns and their interactions with other patterns, as well as map out specific functions the network has learned. As a case study, we demonstrate the utility of GIA on the computational task of predicting RNA-protein interactions from sequence. We first introduce a convolutional network, we call ResidualBind, and benchmark its performance against previous methods on RNAcompete data. Using GIA, we then demonstrate that in addition to sequence motifs, ResidualBind learns a model that considers the number of motifs, their spacing, and sequence context, such as RNA secondary structure and GC-bias.
Collapse
Affiliation(s)
- Peter K. Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Antonio Majdandzic
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Matthew Ploenzke
- Department of Biostatistics, Harvard University, Cambridge, Massachusetts, United States of America
| | - Praveen Anand
- Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Steffan B. Paul
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
30
|
Chen K, Song B, Tang Y, Wei Z, Xu Q, Su J, de Magalhães JP, Rigden DJ, Meng J. RMDisease: a database of genetic variants that affect RNA modifications, with implications for epitranscriptome pathogenesis. Nucleic Acids Res 2021; 49:D1396-D1404. [PMID: 33010174 PMCID: PMC7778951 DOI: 10.1093/nar/gkaa790] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/08/2020] [Accepted: 09/11/2020] [Indexed: 12/11/2022] Open
Abstract
Deciphering the biological impacts of millions of single nucleotide variants remains a major challenge. Recent studies suggest that RNA modifications play versatile roles in essential biological mechanisms, and are closely related to the progression of various diseases including multiple cancers. To comprehensively unveil the association between disease-associated variants and their epitranscriptome disturbance, we built RMDisease, a database of genetic variants that can affect RNA modifications. By integrating the prediction results of 18 different RNA modification prediction tools and also 303,426 experimentally-validated RNA modification sites, RMDisease identified a total of 202,307 human SNPs that may affect (add or remove) sites of eight types of RNA modifications (m6A, m5C, m1A, m5U, Ψ, m6Am, m7G and Nm). These include 4,289 disease-associated variants that may imply disease pathogenesis functioning at the epitranscriptome layer. These SNPs were further annotated with essential information such as post-transcriptional regulations (sites for miRNA binding, interaction with RNA-binding proteins and alternative splicing) revealing putative regulatory circuits. A convenient graphical user interface was constructed to support the query, exploration and download of the relevant information. RMDisease should make a useful resource for studying the epitranscriptome impact of genetic variants via multiple RNA modifications with emphasis on their potential disease relevance. RMDisease is freely accessible at: www.xjtlu.edu.cn/biologicalsciences/rmd.
Collapse
Affiliation(s)
- Kunqi Chen
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX Liverpool, UK
| | - Bowen Song
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
- Department of Mathematical Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Yujiao Tang
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Zhen Wei
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Qingru Xu
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Jionglong Su
- Department of Mathematical Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | | | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Jia Meng
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| |
Collapse
|
31
|
Pan X, Fang Y, Li X, Yang Y, Shen HB. RBPsuite: RNA-protein binding sites prediction suite based on deep learning. BMC Genomics 2020; 21:884. [PMID: 33297946 PMCID: PMC7724624 DOI: 10.1186/s12864-020-07291-6] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Accepted: 11/28/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND RNA-binding proteins (RBPs) play crucial roles in various biological processes. Deep learning-based methods have been demonstrated powerful on predicting RBP sites on RNAs. However, the training of deep learning models is very time-intensive and computationally intensive. RESULTS Here we present a deep learning-based RBPsuite, an easy-to-use webserver for predicting RBP binding sites on linear and circular RNAs. For linear RNAs, RBPsuite predicts the RBP binding scores with them using our updated iDeepS. For circular RNAs (circRNAs), RBPsuite predicts the RBP binding scores with them using our developed CRIP. RBPsuite first breaks the input RNA sequence into segments of 101 nucleotides and scores the interaction between the segments and the RBPs. RBPsuite further detects the verified motifs on the binding segments gives the binding scores distribution along the full-length sequence. CONCLUSIONS RBPsuite is an easy-to-use online webserver for predicting RBP binding sites and freely available at http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/ .
Collapse
Affiliation(s)
- Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| | - Yi Fang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Xianfeng Li
- Key laboratory of Carcinogenesis and Translational Research, Peking University Cancer Hospital, Beijing, 100142, China
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Center for Brain-Like Computing and Machine Intelligence, Shanghai, 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| |
Collapse
|