Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Miladi M, Junge A, Costa F, Seemann SE, Havgaard JH, Gorodkin J, Backofen R. RNAscClust: clustering RNA sequences using structure conservation and graph based motifs. Bioinformatics 2017;33:2089-2096. [PMID: 28334186 PMCID: PMC5870858 DOI: 10.1093/bioinformatics/btx114] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 12/22/2016] [Accepted: 02/23/2017] [Indexed: 12/22/2022] Open

For:	Miladi M, Junge A, Costa F, Seemann SE, Havgaard JH, Gorodkin J, Backofen R. RNAscClust: clustering RNA sequences using structure conservation and graph based motifs. Bioinformatics 2017;33:2089-2096. [PMID: 28334186 PMCID: PMC5870858 DOI: 10.1093/bioinformatics/btx114] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 12/22/2016] [Accepted: 02/23/2017] [Indexed: 12/22/2022] Open

Number

Cited by Other Article(s)

Gadekar V, Munk AW, Miladi M, Junge A, Backofen R, Seemann S, Gorodkin J. Clusters of mammalian conserved RNA structures in UTRs associate with RBP binding sites. NAR Genom Bioinform 2024;6:lqae089. [PMID: 39131818 PMCID: PMC11310781 DOI: 10.1093/nargab/lqae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 06/26/2024] [Accepted: 07/16/2024] [Indexed: 08/13/2024] Open

Affiliation(s)

Veerendra P Gadekar Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark Centre for Integrative Biology and Systems Medicine (IBSE), IIT Madras, Chennai, India Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India
Alexander Welford Munk Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
Milad Miladi Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
Alexander Junge Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
Rolf Backofen Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
Stefan E Seemann Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
Jan Gorodkin Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark

Collapse

Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024;2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]

Genome-Wide RNA Secondary Structure Prediction. Methods Mol Biol 2023;2586:35-48. [PMID: 36705897 DOI: 10.1007/978-1-0716-2768-6_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Sim M, Lee J, Kwon D, Lee D, Park N, Wy S, Ko Y, Kim J. Reference-based read clustering improves the de novo genome assembly of microbial strains. Comput Struct Biotechnol J 2022;21:444-451. [PMID: 36618978 PMCID: PMC9804104 DOI: 10.1016/j.csbj.2022.12.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 12/17/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022] Open

Ono Y, Katayama K, Onuma T, Kubo K, Tsuyuzaki H, Hamada M, Sato M. Structure-based screening for functional non-coding RNAs in fission yeast identifies a factor repressing untimely initiation of sexual differentiation. Nucleic Acids Res 2022;50:11229-11242. [PMID: 36259651 PMCID: PMC9638895 DOI: 10.1093/nar/gkac825] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 09/06/2022] [Accepted: 09/14/2022] [Indexed: 12/04/2022] Open

Affiliation(s)

Yu Ono Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan
Kenta Katayama Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
Tomoki Onuma Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan
Kento Kubo Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan.,Bioinformatics Laboratory, Department of Electrical Engineering and Bioscience, School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo Shinjuku-ku, Tokyo 169-8555, Japan
Hayato Tsuyuzaki Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
Michiaki Hamada Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan.,Bioinformatics Laboratory, Department of Electrical Engineering and Bioscience, School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo Shinjuku-ku, Tokyo 169-8555, Japan.,Institute for Medical-oriented Structural Biology, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan
Masamitsu Sato Laboratory of Cytoskeletal Logistics, Department of Life Science and Medical Bioscience, School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan.,Institute for Medical-oriented Structural Biology, Waseda University, 2-2 Wakamatsucho, Shinjuku-ku, Tokyo 162-8480, Japan.,Institute for Advanced Research of Biosystem Dynamics, Waseda Research Institute for Science and Engineering, Graduate School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan

Collapse

Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs. Int J Mol Sci 2021;22:8719. [PMID: 34445436 PMCID: PMC8395733 DOI: 10.3390/ijms22168719] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 02/06/2023] Open

Miladi M, Raden M, Will S, Backofen R. Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains. Algorithms Mol Biol 2020;15:19. [PMID: 33292340 PMCID: PMC7666477 DOI: 10.1186/s13015-020-00179-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 10/16/2020] [Indexed: 12/14/2022] Open

Chatterjee A, Mukherjee S, Basu B, Roy D, Basu R, Ghosh H, Mishra L, Bhattacharya M, Chakraborty N. Insight into the distinctive paradigm of Human Cytomegalovirus associated intrahepatic and extrahepatic cholestasis in neonates. Sci Rep 2020;10:15861. [PMID: 32985571 PMCID: PMC7522230 DOI: 10.1038/s41598-020-73009-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Accepted: 08/18/2020] [Indexed: 01/15/2023] Open

Müller T, Miladi M, Hutter F, Hofacker I, Will S, Backofen R. The locality dilemma of Sankoff-like RNA alignments. Bioinformatics 2020;36:i242-i250. [PMID: 32657398 PMCID: PMC7355259 DOI: 10.1093/bioinformatics/btaa431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Abstract

Motivation

Elucidating the functions of non-coding RNAs by homology has been strongly limited due to fundamental computational and modeling issues. While existing simultaneous alignment and folding (SA&F) algorithms successfully align homologous RNAs with precisely known boundaries (global SA&F), the more pressing problem of identifying new classes of homologous RNAs in the genome (local SA&F) is intrinsically more difficult and much less understood. Typically, the length of local alignments is strongly overestimated and alignment boundaries are dramatically mispredicted. We hypothesize that local SA&F approaches are compromised this way due to a score bias, which is caused by the contribution of RNA structure similarity to their overall alignment score.

Results

In the light of this hypothesis, we study pairwise local SA&F for the first time systematically—based on a novel local RNA alignment benchmark set and quality measure. First, we vary the relative influence of structure similarity compared to sequence similarity. Putting more emphasis on the structure component leads to overestimating the length of local alignments. This clearly shows the bias of current scores and strongly hints at the structure component as its origin. Second, we study the interplay of several important scoring parameters by learning parameters for local and global SA&F. The divergence of these optimized parameter sets underlines the fundamental obstacles for local SA&F. Third, by introducing a position-wise correction term in local SA&F, we constructively solve its principal issues.

Availability and implementation

The benchmark data, detailed results and scripts are available at https://github.com/BackofenLab/local_alignment. The RNA alignment tool LocARNA, including the modifications proposed in this work, is available at https://github.com/s-will/LocARNA/releases/tag/v2.0.0RC6.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Mautner S, Montaseri S, Miladi M, Raden M, Costa F, Backofen R. ShaKer: RNA SHAPE prediction using graph kernel. Bioinformatics 2020;35:i354-i359. [PMID: 31510707 PMCID: PMC6612843 DOI: 10.1093/bioinformatics/btz395] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open

Berkemer SJ, Maier LK, Amman F, Bernhart SH, Wörtz J, Märkle P, Pfeiffer F, Stadler PF, Marchfelder A. Identification of RNA 3´ ends and termination sites in Haloferax volcanii. RNA Biol 2020;17:663-676. [PMID: 32041469 PMCID: PMC7237163 DOI: 10.1080/15476286.2020.1723328] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Miladi M, Sokhoyan E, Houwaart T, Heyne S, Costa F, Grüning B, Backofen R. GraphClust2: Annotation and discovery of structured RNAs with scalable and accessible integrative clustering. Gigascience 2019;8:giz150. [PMID: 31808801 PMCID: PMC6897289 DOI: 10.1093/gigascience/giz150] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 08/23/2019] [Accepted: 11/20/2019] [Indexed: 11/18/2022] Open

Crum M, Ram-Mohan N, Meyer MM. Regulatory context drives conservation of glycine riboswitch aptamers. PLoS Comput Biol 2019;15:e1007564. [PMID: 31860665 PMCID: PMC6944388 DOI: 10.1371/journal.pcbi.1007564] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 01/06/2020] [Accepted: 11/25/2019] [Indexed: 12/13/2022] Open

Abstract

In comparison to protein coding sequences, the impact of mutation and natural selection on the sequence and function of non-coding (ncRNA) genes is not well understood. Many ncRNA genes are narrowly distributed to only a few organisms, and appear to be rapidly evolving. Compared to protein coding sequences, there are many challenges associated with assessment of ncRNAs that are not well addressed by conventional phylogenetic approaches, including: short sequence length, lack of primary sequence conservation, and the importance of secondary structure for biological function. Riboswitches are structured ncRNAs that directly interact with small molecules to regulate gene expression in bacteria. They typically consist of a ligand-binding domain (aptamer) whose folding changes drive changes in gene expression. The glycine riboswitch is among the most well-studied due to the widespread occurrence of a tandem aptamer arrangement (tandem), wherein two homologous aptamers interact with glycine and each other to regulate gene expression. However, a significant proportion of glycine riboswitches are comprised of single aptamers (singleton). Here we use graph clustering to circumvent the limitations of traditional phylogenetic analysis when studying the relationship between the tandem and singleton glycine aptamers. Graph clustering enables a broader range of pairwise comparison measures to be used to assess aptamer similarity. Using this approach, we show that one aptamer of the tandem glycine riboswitch pair is typically much more highly conserved, and that which aptamer is conserved depends on the regulated gene. Furthermore, our analysis also reveals that singleton aptamers are more similar to either the first or second tandem aptamer, again based on the regulated gene. Taken together, our findings suggest that tandem glycine riboswitches degrade into functional singletons, with the regulated gene(s) dictating which glycine-binding aptamer is conserved.

Collapse

Eggenhofer F, Hofacker IL, Backofen R, Höner Zu Siederdissen C. CMV: visualization for RNA and protein family models and their comparisons. Bioinformatics 2019;34:2676-2678. [PMID: 29554223 PMCID: PMC6061798 DOI: 10.1093/bioinformatics/bty158] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 03/13/2018] [Indexed: 11/14/2022] Open

Aoki G, Sakakibara Y. Convolutional neural networks for classification of alignments of non-coding RNA sequences. Bioinformatics 2019;34:i237-i244. [PMID: 29949978 PMCID: PMC6022636 DOI: 10.1093/bioinformatics/bty228] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Kirsch R, Seemann SE, Ruzzo WL, Cohen SM, Stadler PF, Gorodkin J. Identification and characterization of novel conserved RNA structures in Drosophila. BMC Genomics 2018;19:899. [PMID: 30537930 PMCID: PMC6288889 DOI: 10.1186/s12864-018-5234-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Accepted: 11/08/2018] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Comparative genomics approaches have facilitated the discovery of many novel non-coding and structured RNAs (ncRNAs). The increasing availability of related genomes now makes it possible to systematically search for compensatory base changes - and thus for conserved secondary structures - even in genomic regions that are poorly alignable in the primary sequence. The wealth of available transcriptome data can add valuable insight into expression and possible function for new ncRNA candidates. Earlier work identifying ncRNAs in Drosophila melanogaster made use of sequence-based alignments and employed a sliding window approach, inevitably biasing identification toward RNAs encoded in the more conserved parts of the genome.

RESULTS

To search for conserved RNA structures (CRSs) that may not be highly conserved in sequence and to assess the expression of CRSs, we conducted a genome-wide structural alignment screen of 27 insect genomes including D. melanogaster and integrated this with an extensive set of tiling array data. The structural alignment screen revealed ∼30,000 novel candidate CRSs at an estimated false discovery rate of less than 10%. With more than one quarter of all individual CRS motifs showing sequence identities below 60%, the predicted CRSs largely complement the findings of sliding window approaches applied previously. While a sixth of the CRSs were ubiquitously expressed, we found that most were expressed in specific developmental stages or cell lines. Notably, most statistically significant enrichment of CRSs were observed in pupae, mainly in exons of untranslated regions, promotors, enhancers, and long ncRNAs. Interestingly, cell lines were found to express a different set of CRSs than were found in vivo. Only a small fraction of intergenic CRSs were co-expressed with the adjacent protein coding genes, which suggests that most intergenic CRSs are independent genetic units.

CONCLUSIONS

This study provides a more comprehensive view of the ncRNA transcriptome in fly as well as evidence for differential expression of CRSs during development and in cell lines.

Collapse

Affiliation(s)

Rebecca Kirsch Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark Department of Veterinary and Animal Science, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, D-04107 Germany
Stefan E. Seemann Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark Department of Veterinary and Animal Science, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark
Walter L. Ruzzo Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark School of Computer Science and Engineering, University of Washington, Box 352350, Seattle, 98195-2350 WA USA Department of Genome Sciences, University of Washington, Box 355065, Seattle, 98195-5065 WA USA Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Seattle, 98109-1024 WA USA
Stephen M. Cohen Department of Cellular and Molecular Medicine, University of Copenhagen, Blegdamsvej 3, Copenhagen N, DK-2200 Denmark
Peter F. Stadler Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, D-04107 Germany Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103 Germany Faculdad de Ciencias, Universidad Nacional de Colombia, Sede Bogotá, Ciudad Universitaria, Bogotá, COL-111321 D.C. Colombia Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090 Austria Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501 USA
Jan Gorodkin Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark Department of Veterinary and Animal Science, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark

Collapse

Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures. Genes (Basel) 2018;9:genes9120604. [PMID: 30518121 PMCID: PMC6315940 DOI: 10.3390/genes9120604] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 11/28/2018] [Accepted: 11/29/2018] [Indexed: 02/03/2023] Open

Abstract

Self-contained structured domains of RNA sequences have often distinct molecular functions. Determining the boundaries of structured domains of a non-coding RNA (ncRNA) is needed for many ncRNA gene finder programs that predict RNA secondary structures in aligned genomes because these methods do not necessarily provide precise information about the boundaries or the location of the RNA structure inside the predicted ncRNA. Even without having a structure prediction, it is of interest to search for structured domains, such as for finding common RNA motifs in RNA-protein binding assays. The precise definition of the boundaries are essential for downstream analyses such as RNA structure modelling, e.g., through covariance models, and RNA structure clustering for the search of common motifs. Such efforts have so far been focused on single sequences, thus here we present a comparison for boundary definition between single sequence and multiple sequence alignments. We also present a novel approach, named RNAbound, for finding the boundaries that are based on probabilities of evolutionarily conserved base pairings. We tested the performance of two different methods on a limited number of Rfam families using the annotated structured RNA regions in the human genome and their multiple sequence alignments created from 14 species. The results show that multiple sequence alignments improve the boundary prediction for branched structures compared to single sequences independent of the chosen method. The actual performance of the two methods differs on single hairpin structures and branched structures. For the RNA families with branched structures, including transfer RNA (tRNA) and small nucleolar RNAs (snoRNAs), RNAbound improves the boundary predictions using multiple sequence alignments to median differences of −6 and −11.5 nucleotides (nts) for left and right boundary, respectively (window size of 200 nts).

Collapse

Phylogenomic and comparative analysis of the distribution and regulatory patterns of TPP riboswitches in fungi. Sci Rep 2018;8:5563. [PMID: 29615754 PMCID: PMC5882874 DOI: 10.1038/s41598-018-23900-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Accepted: 03/20/2018] [Indexed: 01/01/2023] Open

Dotu I, Adamson SI, Coleman B, Fournier C, Ricart-Altimiras E, Eyras E, Chuang JH. SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data. PLoS Comput Biol 2018;14:e1006078. [PMID: 29596423 PMCID: PMC5892938 DOI: 10.1371/journal.pcbi.1006078] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 04/10/2018] [Accepted: 03/05/2018] [Indexed: 12/02/2022] Open

Abstract

RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC.

RNA-protein binding is critical to gene regulation, and aberrant RNA-protein interactions play a role in a wide variety of diseases. However, molecular understanding of these interactions remains limited because of the difficulty of ascertaining the motifs that bind each protein. To address this challenge, we have developed a novel algorithm, SARNAclust, to computationally identify combined structure/sequence motifs from immunoprecipitation data. SARNAclust can deconvolve multiple motifs simultaneously and determine the importance of specific features through a graph kernel and bulge graph formalism. We have verified SARNAclust to be effective on synthetic motif data and also tested it on ENCODE eCLIP datasets, identifying known motifs and novel predictions. We have experimentally validated SARNAclust for two proteins, SLBP and ILF3, using RNA Bind-n-Seq measurements. Applying SARNAclust to ENCODE data provides new evidence for previously unknown regulatory interactions, notably splicing co-regulation by ILF3 and the splicing factor hnRNPC.

Collapse

Smith MA, Seemann SE, Quek XC, Mattick JS. DotAligner: identification and clustering of RNA structure motifs. Genome Biol 2017;18:244. [PMID: 29284541 PMCID: PMC5747123 DOI: 10.1186/s13059-017-1371-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 12/05/2017] [Indexed: 01/01/2023] Open

Kato Y, Gorodkin J, Havgaard JH. Alignment-free comparative genomic screen for structured RNAs using coarse-grained secondary structure dot plots. BMC Genomics 2017;18:935. [PMID: 29197323 PMCID: PMC5712110 DOI: 10.1186/s12864-017-4309-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 11/15/2017] [Indexed: 01/01/2023] Open

Li X, Wong KC. Multiobjective Patient Stratification Using Evolutionary Multiobjective Optimization. IEEE J Biomed Health Inform 2017;22:1619-1629. [PMID: 29990162 DOI: 10.1109/jbhi.2017.2769711] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Seemann SE, Mirza AH, Hansen C, Bang-Berthelsen CH, Garde C, Christensen-Dalsgaard M, Torarinsson E, Yao Z, Workman CT, Pociot F, Nielsen H, Tommerup N, Ruzzo WL, Gorodkin J. The identification and functional annotation of RNA structures conserved in vertebrates. Genome Res 2017;27:1371-1383. [PMID: 28487280 PMCID: PMC5538553 DOI: 10.1101/gr.208652.116] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Accepted: 05/04/2017] [Indexed: 01/15/2023]

Affiliation(s)

Stefan E Seemann Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, DK-1870 Frederiksberg, Denmark
Aashiq H Mirza Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Copenhagen Diabetes Research Center (CPH-DIRECT), Herlev University Hospital, DK-2730 Herlev, Denmark
Claus Hansen Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Cellular and Molecular Medicine (ICMM), Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark
Claus H Bang-Berthelsen Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Obesity Biology and Department of Molecular Genetics, Novo Nordisk A/S, DK-2880 Bagsværd, Denmark
Christian Garde Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Biotechnology and Biomedicine, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
Mikkel Christensen-Dalsgaard Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Cellular and Molecular Medicine (ICMM), Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark
Elfar Torarinsson Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark
Zizhen Yao Allen Institute for Brain Science, Seattle, Washington 98109, USA
Christopher T Workman Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Biotechnology and Biomedicine, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
Flemming Pociot Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Copenhagen Diabetes Research Center (CPH-DIRECT), Herlev University Hospital, DK-2730 Herlev, Denmark
Henrik Nielsen Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Cellular and Molecular Medicine (ICMM), Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark
Niels Tommerup Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Cellular and Molecular Medicine (ICMM), Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark
Walter L Ruzzo Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,School of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
Jan Gorodkin Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark.,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, DK-1870 Frederiksberg, Denmark

Collapse