1
|
Khan N, Rahaman M, Zhang S. GINClus: RNA structural motif clustering using graph isomorphism network. NAR Genom Bioinform 2025; 7:lqaf050. [PMID: 40290315 PMCID: PMC12034103 DOI: 10.1093/nargab/lqaf050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 04/02/2025] [Accepted: 04/15/2025] [Indexed: 04/30/2025] Open
Abstract
Ribonucleic acid (RNA) structural motif identification is a crucial step for understanding RNA structure and functionality. Due to the complexity and variations of RNA 3D structures, identifying RNA structural motifs is challenging and time-consuming. Particularly, discovering new RNA structural motif families is a hard problem and still largely depends on manual analysis. In this paper, we proposed an RNA structural motif clustering tool, named GINClus, which uses a semi-supervised deep learning model to cluster RNA motif candidates (RNA loop regions) based on both base interaction and 3D structure similarities. GINClus converts base interactions and 3D structures of RNA motif candidates into graph representations and using graph isomorphism network (GIN) model in combination with K-means and hierarchical agglomerative clustering, GINClus clusters the RNA motif candidates based on their structural similarities. GINClus has a clustering accuracy of 87.88% for known internal loop motifs and 97.69% for known hairpin loop motifs. Using GINClus, we successfully clustered the motifs of the same families together and were able to find 927 new instances of Sarcin-ricin, Kink-turn, Tandem-shear, Hook-turn, E-loop, C-loop, T-loop, and GNRA loop motif families. We also identified 12 new RNA structural motif families with unique structure and base-pair interactions.
Collapse
Affiliation(s)
- Nabila Shahnaz Khan
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Md Mahfuzur Rahaman
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
2
|
Rahaman MM, Zhang S. RNAMotifProfile: a graph-based approach to build RNA structural motif profiles. NAR Genom Bioinform 2024; 6:lqae128. [PMID: 39328267 PMCID: PMC11426329 DOI: 10.1093/nargab/lqae128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/24/2024] [Accepted: 09/09/2024] [Indexed: 09/28/2024] Open
Abstract
RNA structural motifs are the recurrent segments in RNA three-dimensional structures that play a crucial role in the functional diversity of RNAs. Understanding the similarities and variations within these recurrent motif groups is essential for gaining insights into RNA structure and function. While recurrent structural motifs are generally assumed to be composed of the same isosteric base interactions, this consistent pattern is not observed across all examples of these motifs. Existing methods for analyzing and comparing RNA structural motifs may overlook variations in base interactions and associated nucleotides. RNAMotifProfile is a novel profile-to-profile alignment algorithm that generates a comprehensive profile from a group of structural motifs, incorporating all base interactions and associated nucleotides at each position. By structurally aligning input motif instances using a guide-tree-based approach, RNAMotifProfile captures the similarities and variations within recurrent motif groups. Additionally, RNAMotifProfile can function as a motif search tool, enabling the identification of instances of a specific motif family by searching with the corresponding profile. The ability to generate accurate and comprehensive profiles for RNA structural motif families, and to search for these motifs, facilitates a deeper understanding of RNA structure-function relationships and potential applications in RNA engineering and therapeutic design.
Collapse
Affiliation(s)
- Md Mahfuzur Rahaman
- Department of Computer Science, University of Central Florida, 4328 Scorpius Street, Orlando, FL 32816-2362, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, 4328 Scorpius Street, Orlando, FL 32816-2362, USA
| |
Collapse
|
3
|
Bohdan DR, Voronina VV, Bujnicki JM, Baulin EF. A comprehensive survey of long-range tertiary interactions and motifs in non-coding RNA structures. Nucleic Acids Res 2023; 51:8367-8382. [PMID: 37471030 PMCID: PMC10484739 DOI: 10.1093/nar/gkad605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 07/07/2023] [Indexed: 07/21/2023] Open
Abstract
Understanding the 3D structure of RNA is key to understanding RNA function. RNA 3D structure is modular and can be seen as a composition of building blocks of various sizes called tertiary motifs. Currently, long-range motifs formed between distant loops and helical regions are largely less studied than the local motifs determined by the RNA secondary structure. We surveyed long-range tertiary interactions and motifs in a non-redundant set of non-coding RNA 3D structures. A new dataset of annotated LOng-RAnge RNA 3D modules (LORA) was built using an approach that does not rely on the automatic annotations of non-canonical interactions. An original algorithm, ARTEM, was developed for annotation-, sequence- and topology-independent superposition of two arbitrary RNA 3D modules. The proposed methods allowed us to identify and describe the most common long-range RNA tertiary motifs. Along with the prevalent canonical A-minor interactions, a large number of previously undescribed staple interactions were observed. The most frequent long-range motifs were found to belong to three main motif families: planar staples, tilted staples, and helical packing motifs.
Collapse
Affiliation(s)
- Davyd R Bohdan
- Department of Innovation and High Technology, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Valeria V Voronina
- Department of Information Systems, Ulyanovsk State Technical University, Ulyanovsk 432027, Russia
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw 02-109, Poland
| | - Eugene F Baulin
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw 02-109, Poland
| |
Collapse
|
4
|
Rahaman MM, Khan NS, Zhang S. RNAMotifComp: a comprehensive method to analyze and identify structurally similar RNA motif families. Bioinformatics 2023; 39:i337-i346. [PMID: 37387191 DOI: 10.1093/bioinformatics/btad223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The 3D structures of RNA play a critical role in understanding their functionalities. There exist several computational methods to study RNA 3D structures by identifying structural motifs and categorizing them into several motif families based on their structures. Although the number of such motif families is not limited, a few of them are well-studied. Out of these structural motif families, there exist several families that are visually similar or very close in structure, even with different base interactions. Alternatively, some motif families share a set of base interactions but maintain variation in their 3D formations. These similarities among different motif families, if known, can provide a better insight into the RNA 3D structural motifs as well as their characteristic functions in cell biology. RESULTS In this work, we proposed a method, RNAMotifComp, that analyzes the instances of well-known structural motif families and establishes a relational graph among them. We also have designed a method to visualize the relational graph where the families are shown as nodes and their similarity information is represented as edges. We validated our discovered correlations of the motif families using RNAMotifContrast. Additionally, we used a basic Naïve Bayes classifier to show the importance of RNAMotifComp. The relational analysis explains the functional analogies of divergent motif families and illustrates the situations where the motifs of disparate families are predicted to be of the same family. AVAILABILITY AND IMPLEMENTATION Source code publicly available at https://github.com/ucfcbb/RNAMotifFamilySimilarity.
Collapse
Affiliation(s)
- Md Mahfuzur Rahaman
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Nabila Shahnaz Khan
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
5
|
Khan NS, Rahaman MM, Islam S, Zhang S. RNA-NRD: a non-redundant RNA structural dataset for benchmarking and functional analysis. NAR Genom Bioinform 2023; 5:lqad040. [PMID: 37123530 PMCID: PMC10132383 DOI: 10.1093/nargab/lqad040] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 03/04/2023] [Accepted: 04/12/2023] [Indexed: 05/02/2023] Open
Abstract
The significance of RNA functions and their role in evolution and disease control have remarkably increased the research scope in the field of RNA science. Though the availability of RNA structure data in PBD has been growing tremendously, maintaining their quality and integrity has become the greater challenge. Since the data available in PDB are results of different independent research, they might contain redundancy. As a result, there remains a possibility of data bias for both protein and RNA chains. Quite a few studies have been conducted to remove the redundancy of protein structures by introducing high-quality representatives. However, the amount of research done to remove the redundancy of RNA structures is still very low. To remove RNA chain redundancy in PDB, we have introduced RNA-NRD, a non-redundant dataset of RNA chains based on sequence and 3D structural similarity. We compared RNA-NRD with the existing non-redundant RNA structure dataset RS-RNA and showed that it has better-formed clusters of redundant RNA chains with lower average RMSD and higher average PSI, thus improving the overall quality of the dataset.
Collapse
Affiliation(s)
- Nabila Shahnaz Khan
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Md Mahfuzur Rahaman
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shahidul Islam
- School of Computing and Design, California State University, Monterey Bay, Seaside, CA 93955, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
6
|
Gianfrotta C, Reinharz V, Lespinet O, Barth D, Denise A. On the predictibility of A-minor motifs from their local contexts. RNA Biol 2022; 19:1208-1227. [PMID: 36384383 PMCID: PMC9673937 DOI: 10.1080/15476286.2022.2144611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
This study investigates the importance of the structural context in the formation of a type I/II A-minor motif. This very frequent structural motif has been shown to be important in the spatial folding of RNA molecules. We developed an automated method to classify A-minor motif occurrences according to their 3D context similarities, and we used a graph approach to represent both the structural A-minor motif occurrences and their classes at different scales. This approach leads us to uncover new subclasses of A-minor motif occurrences according to their local 3D similarities. The majority of classes are composed of homologous occurrences, but some of them are composed of non-homologous occurrences. The different classifications we obtain allow us to better understand the importance of the context in the formation of A-minor motifs. In a second step, we investigate how much knowledge of the context around an A-minor motif can help to infer its presence (and position). More specifically, we want to determine what kind of information, contained in the structural context, can be useful to characterize and predict A-minor motifs. We show that, for some A-minor motifs, the topology combined with a sequence signal is sufficient to predict the presence and the position of an A-minor motif occurrence. In most other cases, these signals are not sufficient for predicting the A-minor motif, however we show that they are good signals for this purpose. All the classification and prediction pipelines rely on automated processes, for which we describe the underlying algorithms and parameters.
Collapse
Affiliation(s)
- Coline Gianfrotta
- Données et Algorithmes pour une Ville Intelligente et Durable (DAVID), Université de Versailles Saint-Quentin-en-Yvelines, Université Paris-Saclay, Versailles, France,Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Université Paris-Saclay, CNRS, Orsay, France,CONTACT Coline Gianfrotta Données et Algorithmes pour une Ville Intelligente et Durable (DAVID), Université de Versailles Saint-Quentin-en-Yvelines, Université Paris-Saclay, France
| | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Québec, Canada
| | - Olivier Lespinet
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, Gif-sur-Yvette, France
| | - Dominique Barth
- Données et Algorithmes pour une Ville Intelligente et Durable (DAVID), Université de Versailles Saint-Quentin-en-Yvelines, Université Paris-Saclay, Versailles, France
| | - Alain Denise
- Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Université Paris-Saclay, CNRS, Orsay, France,Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, Gif-sur-Yvette, France
| |
Collapse
|