1
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
2
|
Cao T, Li Q, Huang Y, Li A. plotnineSeqSuite: a Python package for visualizing sequence data using ggplot2 style. BMC Genomics 2023; 24:585. [PMID: 37789265 PMCID: PMC10546746 DOI: 10.1186/s12864-023-09677-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 09/14/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND The visual sequence logo has been a hot area in the development of bioinformatics tools. ggseqlogo written in R language has been the most popular API since it was published. With the popularity of artificial intelligence and deep learning, Python is currently the most popular programming language. The programming language used by bioinformaticians began to shift to Python. Providing APIs in Python that are similar to those in R can reduce the learning cost of relearning a programming language. And compared to ggplot2 in R, drawing framework is not as easy to use in Python. The appearance of plotnine (ggplot2 in Python version) makes it possible to unify the programming methods of bioinformatics visualization tools between R and Python. RESULTS Here, we introduce plotnineSeqSuite, a new plotnine-based Python package provides a ggseqlogo-like API for programmatic drawing of sequence logos, sequence alignment diagrams and sequence histograms. To be more precise, it supports custom letters, color themes, and fonts. Moreover, the class for drawing layers is based on object-oriented design so that users can easily encapsulate and extend it. CONCLUSIONS plotnineSeqSuite is the first ggplot2-style package to implement visualization of sequence -related graphs in Python. It enhances the uniformity of programmatic plotting between R and Python. Compared with tools appeared already, the categories supported by plotnineSeqSuite are much more complete. The source code of plotnineSeqSuite can be obtained on GitHub ( https://github.com/caotianze/plotnineseqsuite ) and PyPI ( https://pypi.org/project/plotnineseqsuite ), and the documentation homepage is freely available on GitHub at ( https://caotianze.github.io/plotnineseqsuite/ ).
Collapse
Affiliation(s)
- Tianze Cao
- School of Mathematics, Hangzhou Normal University, Hangzhou, Zhejiang Province, China
| | - Qian Li
- Department of Rehabilitation, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei Province, China
| | - Yuexia Huang
- School of Mathematics, Hangzhou Normal University, Hangzhou, Zhejiang Province, China.
| | - Anshui Li
- Department of Statistics, Shaoxing University, Shaoxing, Zhejiang Province, China.
| |
Collapse
|
3
|
Sass AM, Coenye T. The Small RNA NcS25 Regulates Biological Amine-Transporting Outer Membrane Porin BCAL3473 in Burkholderia cenocepacia. mSphere 2023; 8:e0008323. [PMID: 36971554 PMCID: PMC10117139 DOI: 10.1128/msphere.00083-23] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 03/03/2023] [Indexed: 03/29/2023] Open
Abstract
Regulation of porin expression in bacteria is complex and often involves small-RNA regulators. Several small-RNA regulators have been described for Burkholderia cenocepacia, and this study aimed to characterize the biological role of the conserved small RNA NcS25 and its cognate target, outer membrane protein BCAL3473. The B. cenocepacia genome carries a large number of genes encoding porins with yet-uncharacterized functions. Expression of the porin BCAL3473 is strongly repressed by NcS25 and activated by other factors, such as a LysR-type regulator and nitrogen-depleted growth conditions. The porin is involved in transport of arginine, tyrosine, tyramine, and putrescine across the outer membrane. Porin BCAL3473, with NcS25 as a major regulator, plays an important role in the nitrogen metabolism of B. cenocepacia. IMPORTANCE Burkholderia cenocepacia is a Gram-negative bacterium which causes infections in immunocompromised individuals and in people with cystic fibrosis. A low outer membrane permeability is one of the factors giving it a high level of innate resistance to antibiotics. Porins provide selective permeability for nutrients, and antibiotics can also traverse the outer membrane by this means. Knowing the properties and specificities of porin channels is therefore important for understanding resistance mechanisms and for developing new antibiotics and could help in overcoming permeability issues in antibiotic treatment.
Collapse
Affiliation(s)
- Andrea M. Sass
- Laboratory of Pharmaceutical Microbiology, Ghent University, Ghent, Belgium
| | - Tom Coenye
- Laboratory of Pharmaceutical Microbiology, Ghent University, Ghent, Belgium
| |
Collapse
|
4
|
Web Services for RNA-RNA Interaction Prediction. Methods Mol Biol 2023; 2586:175-195. [PMID: 36705905 DOI: 10.1007/978-1-0716-2768-6_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Non-coding RNAs have various biological functions such as translational regulation, and RNA-RNA interactions play essential roles in the mechanisms of action of these RNAs. Therefore, RNA-RNA interaction prediction is an important problem in bioinformatics, and many tools have been developed for the computational prediction of RNA-RNA interactions. In addition to the development of novel algorithms with high accuracy, the development and maintenance of web services is essential for enhancing usability by experimental biologists. In this review, we survey web services for RNA-RNA interaction predictions and introduce how to use primary web services. We present various prediction tools, including general interaction prediction tools, prediction tools for specific RNA classes, and RNA-RNA interaction-based RNA design tools. Additionally, we discuss the future perspectives of the development of RNA-RNA interaction prediction tools and the sustainability of web services.
Collapse
|
5
|
Tsybulskyi V, Mounir M, Meyer IM. R-chie: a web server and R package for visualizing cis and trans RNA-RNA, RNA-DNA and DNA-DNA interactions. Nucleic Acids Res 2020; 48:e105. [PMID: 32976561 PMCID: PMC7544209 DOI: 10.1093/nar/gkaa708] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 06/25/2020] [Accepted: 09/22/2020] [Indexed: 12/15/2022] Open
Abstract
Interactions between biological entities are key to understanding their potential functional roles. Three fields of research have recently made particular progress: the investigation of transRNA-RNA and RNA-DNA transcriptome interactions and of trans DNA-DNA genome interactions. We now have both experimental and computational methods for examining these interactions in vivo and on a transcriptome- and genome-wide scale, respectively. Often, key insights can be gained by visually inspecting figures that manage to combine different sources of evidence and quantitative information. We here present R-chie, a web server and R package for visualizing cis and transRNA-RNA, RNA-DNA and DNA-DNA interactions. For this, we have completely revised and significantly extended an earlier version of R-chie (1) which was initially introduced for visualizing RNA secondary structure features. The new R-chie offers a range of unique features for visualizing cis and transRNA-RNA, RNA-DNA and DNA-DNA interactions. Particularly note-worthy features include the ability to incorporate evolutionary information, e.g. multiple-sequence alignments, to compare two alternative sets of information and to incorporate detailed, quantitative information. R-chie is readily available via a web server as well as a corresponding R package called R4RNA which can be used to run the software locally.
Collapse
Affiliation(s)
- Volodymyr Tsybulskyi
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Hannoversche Str. 28, 10115 Berlin, Germany.,Freie Universität Berlin, Department of Mathematics and Computer Science, Bioinformatics Division, Takustr. 9, 14195 Berlin, Germany
| | - Mohamed Mounir
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Hannoversche Str. 28, 10115 Berlin, Germany
| | - Irmtraud M Meyer
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Hannoversche Str. 28, 10115 Berlin, Germany.,Freie Universität Berlin, Department of Biology, Chemistry and Pharmacy, Institute of Chemistry and Biochemistry, Thielallee 63, 14195 Berlin, Germany
| |
Collapse
|
6
|
Tareen A, Kinney JB. Logomaker: beautiful sequence logos in Python. Bioinformatics 2020; 36:2272-2274. [PMID: 31821414 PMCID: PMC7141850 DOI: 10.1093/bioinformatics/btz921] [Citation(s) in RCA: 257] [Impact Index Per Article: 51.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 11/14/2019] [Accepted: 12/06/2019] [Indexed: 01/09/2023] Open
Abstract
Summary Sequence logos are visually compelling ways of illustrating the biological properties of DNA, RNA and protein sequences, yet it is currently difficult to generate and customize such logos within the Python programming environment. Here we introduce Logomaker, a Python API for creating publication-quality sequence logos. Logomaker can produce both standard and highly customized logos from either a matrix-like array of numbers or a multiple-sequence alignment. Logos are rendered as native matplotlib objects that are easy to stylize and incorporate into multi-panel figures. Availability and implementation Logomaker can be installed using the pip package manager and is compatible with both Python 2.7 and Python 3.6. Documentation is provided at http://logomaker.readthedocs.io; source code is available at http://github.com/jbkinney/logomaker.
Collapse
Affiliation(s)
- Ammar Tareen
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| |
Collapse
|
7
|
Lui LM, Uzilov AV, Bernick DL, Corredor A, Lowe TM, Dennis PP. Methylation guide RNA evolution in archaea: structure, function and genomic organization of 110 C/D box sRNA families across six Pyrobaculum species. Nucleic Acids Res 2019; 46:5678-5691. [PMID: 29771354 PMCID: PMC6009581 DOI: 10.1093/nar/gky284] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 04/23/2018] [Indexed: 12/21/2022] Open
Abstract
Archaeal homologs of eukaryotic C/D box small nucleolar RNAs (C/D box sRNAs) guide precise 2′-O-methyl modification of ribosomal and transfer RNAs. Although C/D box sRNA genes constitute one of the largest RNA gene families in archaeal thermophiles, most genomes have incomplete sRNA gene annotation because reliable, fully automated detection methods are not available. We expanded and curated a comprehensive gene set across six species of the crenarchaeal genus Pyrobaculum, particularly rich in C/D box sRNA genes. Using high-throughput small RNA sequencing, specialized computational searches and comparative genomics, we analyzed 526 Pyrobaculum C/D box sRNAs, organizing them into 110 families based on synteny and conservation of guide sequences which determine methylation targets. We examined gene duplications and rearrangements, including one family that has expanded in a pattern similar to retrotransposed repetitive elements in eukaryotes. New training data and inclusion of kink-turn secondary structural features enabled creation of an improved search model. Our analyses provide the most comprehensive, dynamic view of C/D box sRNA evolutionary history within a genus, in terms of modification function, feature plasticity, and gene mobility.
Collapse
Affiliation(s)
- Lauren M Lui
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Andrew V Uzilov
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David L Bernick
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Andrea Corredor
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Todd M Lowe
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Patrick P Dennis
- Department of Biology, Whitman College, Walla Walla, WA 99362, USA
| |
Collapse
|
8
|
Abstract
Over the last two decades it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of noncoding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, D-79110 Freiburg, Germany.,Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Ivo L Hofacker
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.,Bioinformatics and Computational Biology Research Group, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark. .,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria. .,Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany. .,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany. .,Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany. .,Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA.
| |
Collapse
|
9
|
Shabash B, Wiese KC. RNA Visualization: Relevance and the Current State-of-the-Art Focusing on Pseudoknots. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:696-712. [PMID: 26915129 DOI: 10.1109/tcbb.2016.2522421] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
RNA visualization is crucial in order to understand the relationship that exists between RNA structure and its function, as well as the development of better RNA structure prediction algorithms. However, in the context of RNA visualization, one key structure remains difficult to visualize: Pseudoknots. Pseudoknots occur in RNA folding when two secondary structural components form base-pairs between them. The three-dimensional nature of these components makes them challenging to visualize in two-dimensional media, such as print media or screens. In this review, we focus on the advancements that have been made in the field of RNA visualization in two-dimensional media in the past two decades. The review aims at presenting all relevant aspects of pseudoknot visualization. We start with an overview of several pseudoknotted structures and their relevance in RNA function. Next, we discuss the theoretical basis for RNA structural topology classification and present RNA classification systems for both pseudoknotted and non-pseudoknotted RNAs. Each description of RNA classification system is followed by a discussion of the software tools and algorithms developed to date to visualize RNA, comparing the different tools' strengths and shortcomings.
Collapse
|
10
|
Abstract
Secondary structure diagrams are essential, in RNA biology, to communicate functional hypotheses and summarize structural data, and communicate them visually as drafts or finalized publication-ready figures. While many tools are currently available to automate the production of such diagrams, their capacities are usually partial, making it hard for a user to decide which to use in a given context. In this chapter, we guide the reader through the steps involved in the production of expressive publication-quality illustrations featuring the RNA secondary structure. We present major existing representations and layouts, and give precise instructions to produce them using available free software, including jViz.RNA, the PseudoViewer, RILogo, R-chie, RNAplot, R2R, and VARNA. We describe the file formats and structural descriptions accepted by popular RNA visualization tools. We also provide command lines and Python scripts to ease the user's access to advanced features. Finally, we discuss and illustrate alternative approaches to visualize the secondary structure in the presence of probing data, pseudoknots, RNA-RNA interactions, and comparative data.
Collapse
|
11
|
Roca AI. ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos. BMC Proc 2014; 8:S6. [PMID: 25237393 PMCID: PMC4155610 DOI: 10.1186/1753-6561-8-s2-s6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. Results The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. Conclusions The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.
Collapse
Affiliation(s)
- Alberto I Roca
- ProfileGrid.org, P.O. Box 6414, Irvine, California 92616, USA
| |
Collapse
|
12
|
Olsen LR, Kudahl UJ, Simon C, Sun J, Schönbach C, Reinherz EL, Zhang GL, Brusic V. BlockLogo: visualization of peptide and sequence motif conservation. J Immunol Methods 2013; 400-401:37-44. [PMID: 24001880 DOI: 10.1016/j.jim.2013.08.014] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Revised: 08/20/2013] [Accepted: 08/25/2013] [Indexed: 12/21/2022]
Abstract
BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine the specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular expressions. It provides a compact view of discontinuous motifs composed of distant positions within biological sequences. BlockLogo is available at: http://research4.dfci.harvard.edu/cvc/blocklogo/ and http://met-hilab.bu.edu/blocklogo/.
Collapse
Affiliation(s)
- Lars Rønn Olsen
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA; Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | | | | | | |
Collapse
|