1
|
Gadekar V, Munk AW, Miladi M, Junge A, Backofen R, Seemann S, Gorodkin J. Clusters of mammalian conserved RNA structures in UTRs associate with RBP binding sites. NAR Genom Bioinform 2024; 6:lqae089. [PMID: 39131818 PMCID: PMC11310781 DOI: 10.1093/nargab/lqae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 06/26/2024] [Accepted: 07/16/2024] [Indexed: 08/13/2024] Open
Abstract
RNA secondary structures play essential roles in the formation of the tertiary structure and function of a transcript. Recent genome-wide studies highlight significant potential for RNA structures in the mammalian genome. However, a major challenge is assigning functional roles to these structured RNAs. In this study, we conduct a guilt-by-association analysis of clusters of computationally predicted conserved RNA structure (CRSs) in human untranslated regions (UTRs) to associate them with gene functions. We filtered a broad pool of ∼500 000 human CRSs for UTR overlap, resulting in 4734 and 24 754 CRSs from the 5' and 3' UTR of protein-coding genes, respectively. We separately clustered these CRSs for both sets using RNAscClust, obtaining 793 and 2403 clusters, each containing an average of five CRSs per cluster. We identified overrepresented binding sites for 60 and 43 RNA-binding proteins co-localizing with the clustered CRSs. Furthermore, 104 and 441 clusters from the 5' and 3' UTRs, respectively, showed enrichment for various Gene Ontologies, including biological processes such as 'signal transduction', 'nervous system development', molecular functions like 'transferase activity' and the cellular components such as 'synapse' among others. Our study shows that significant functional insights can be gained by clustering RNA structures based on their structural characteristics.
Collapse
Affiliation(s)
- Veerendra P Gadekar
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
- Centre for Integrative Biology and Systems Medicine (IBSE), IIT Madras, Chennai, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India
| | - Alexander Welford Munk
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| | - Milad Miladi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Alexander Junge
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Stefan E Seemann
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| |
Collapse
|
2
|
Lee S, Lee T, Noh YK, Kim S. Ranked k-Spectrum Kernel for Comparative and Evolutionary Comparison of Exons, Introns, and CpG Islands. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1174-1183. [PMID: 31494555 DOI: 10.1109/tcbb.2019.2938949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
MOTIVATION Existing k-mer based string kernel methods have been successfully used for sequence comparison. However, existing kernel methods have limitations for comparative and evolutionary comparisons of genomes due to the sensitiveness to over-represented k-mers and variable sequence lengths. RESULTS In this study, we propose a novel ranked k-spectrum string (RKSS) kernel. 1) RKSS kernel utilizes common k-mer sets across species, named landmarks, that can be used for comparing multiple genomes. 2) Based on the landmarks, we can use ranks of k-mers, rather than frequencies, that can produce more robust distances between genomes. To show the power of RKSS kernel, we conducted two experiments using 10 mammalian species with exon, intron, and CpG island sequences. RKSS kernel reconstructed more consistent evolutionary trees than the k-spectrum string kernel. In the subsequent experiment, for each sequence, kernel distance was calculated from 30 landmarks representing exon, intron, and CpG island sequences of 10 genomes. Based on kernel distances, concordance tests were performed and the result suggested that more information is conserved in CpG islands across species than in introns. In conclusion, our analysis suggests that the relational order, exon CpG island intron, in terms of evolutionary information contents.
Collapse
|
3
|
Miladi M, Sokhoyan E, Houwaart T, Heyne S, Costa F, Grüning B, Backofen R. GraphClust2: Annotation and discovery of structured RNAs with scalable and accessible integrative clustering. Gigascience 2019; 8:giz150. [PMID: 31808801 PMCID: PMC6897289 DOI: 10.1093/gigascience/giz150] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 08/23/2019] [Accepted: 11/20/2019] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND RNA plays essential roles in all known forms of life. Clustering RNA sequences with common sequence and structure is an essential step towards studying RNA function. With the advent of high-throughput sequencing techniques, experimental and genomic data are expanding to complement the predictive methods. However, the existing methods do not effectively utilize and cope with the immense amount of data becoming available. RESULTS Hundreds of thousands of non-coding RNAs have been detected; however, their annotation is lagging behind. Here we present GraphClust2, a comprehensive approach for scalable clustering of RNAs based on sequence and structural similarities. GraphClust2 bridges the gap between high-throughput sequencing and structural RNA analysis and provides an integrative solution by incorporating diverse experimental and genomic data in an accessible manner via the Galaxy framework. GraphClust2 can efficiently cluster and annotate large datasets of RNAs and supports structure-probing data. We demonstrate that the annotation performance of clustering functional RNAs can be considerably improved. Furthermore, an off-the-shelf procedure is introduced for identifying locally conserved structure candidates in long RNAs. We suggest the presence and the sparseness of phylogenetically conserved local structures for a collection of long non-coding RNAs. CONCLUSIONS By clustering data from 2 cross-linking immunoprecipitation experiments, we demonstrate the benefits of GraphClust2 for motif discovery under the presence of biological and methodological biases. Finally, we uncover prominent targets of double-stranded RNA binding protein Roquin-1, such as BCOR's 3' untranslated region that contains multiple binding stem-loops that are evolutionary conserved.
Collapse
Affiliation(s)
- Milad Miladi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Eteri Sokhoyan
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Torsten Houwaart
- Institute of Medical Microbiology and Hospital Hygiene, University of Dusseldorf, Universitaetsstr. 1, 40225 Dusseldorf, Germany
| | - Steffen Heyne
- Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Stuebeweg 51, 79108 Freiburg, Germany
| | - Fabrizio Costa
- Department of Computer Science, University of Exeter, North Park Road, EX4 4QF Exeter, UK
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- ZBSA Centre for Biological Systems Analysis, University of Freiburg, Hauptstr. 1, 79104 Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- ZBSA Centre for Biological Systems Analysis, University of Freiburg, Hauptstr. 1, 79104 Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany
| |
Collapse
|
4
|
Middleton SA, Eberwine J, Kim J. Comprehensive catalog of dendritically localized mRNA isoforms from sub-cellular sequencing of single mouse neurons. BMC Biol 2019; 17:5. [PMID: 30678683 PMCID: PMC6344992 DOI: 10.1186/s12915-019-0630-z] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Accepted: 01/16/2019] [Indexed: 02/06/2023] Open
Abstract
Background RNA localization involves cis-motifs that are recognized by RNA-binding proteins (RBP), which then mediate localization to specific sub-cellular compartments. RNA localization is critical for many different cell functions, e.g., in neuronal dendrites, localization is a critical step for long-lasting synaptic potentiation. However, there is little consensus regarding which RNAs are localized and the role of alternative isoforms in localization. A comprehensive catalog of localized RNA can help dissect RBP/RNA interactions and localization motifs. Here, we utilize a single cell sub-cellular RNA sequencing approach to profile differentially localized RNAs from individual cells across multiple single cells to help identify a consistent set of localized RNA in mouse neurons. Results Using independent RNA sequencing from soma and dendrites of the same neuron, we deeply profiled the sub-cellular transcriptomes to assess the extent and variability of dendritic RNA localization in individual hippocampal neurons, including an assessment of differential localization of alternative 3′UTR isoforms. We identified 2225 dendritic RNAs, including 298 cases of 3′UTR isoform-specific localization. We extensively analyzed the localized RNAs for potential localization motifs, finding that B1 and B2 SINE elements are up to 5.7 times more abundant in localized RNA 3′UTRs than non-localized, and also functionally characterized the localized RNAs using protein structure analysis. Conclusion We integrate our list of localized RNAs with the literature to provide a comprehensive list of known dendritically localized RNAs as a resource. This catalog of transcripts, including differentially localized isoforms and computationally hypothesized localization motifs, will help investigators further dissect the genome-scale mechanism of RNA localization. Electronic supplementary material The online version of this article (10.1186/s12915-019-0630-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sarah A Middleton
- Graduate Program in Genomics and Computational Biology, Biomedical Graduate Studies, University of Pennsylvania, 160 BRB II/III - 421 Curie Blvd, Philadelphia, PA, 19104-6064, USA.,Present Address: Computational Biology, Target Sciences, GlaxoSmithKline R&D, 1250 S. Collegeville Road, Collegeville, PA, 19426, USA
| | - James Eberwine
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, 829 BRB II/III, 421 Curie Blvd, Philadelphia, PA, 19104, USA
| | - Junhyong Kim
- Graduate Program in Genomics and Computational Biology, Biomedical Graduate Studies, University of Pennsylvania, 160 BRB II/III - 421 Curie Blvd, Philadelphia, PA, 19104-6064, USA. .,Department of Biology, University of Pennsylvania, 415 S. University Ave, Philadelphia, PA, 19104, USA.
| |
Collapse
|
5
|
Glouzon JPS, Perreault JP, Wang S. Structurexplor: a platform for the exploration of structural features of RNA secondary structures. Bioinformatics 2018; 33:3117-3120. [PMID: 28575203 DOI: 10.1093/bioinformatics/btx323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 05/26/2017] [Indexed: 11/14/2022] Open
Abstract
Summary Discovering function-related structural features, such as the cloverleaf shape of transfer RNA secondary structures, is essential to understand RNA function. With this aim, we have developed a platform, named Structurexplor, to facilitate the exploration of structural features in populations of RNA secondary structures. It has been designed and developed to help biologists interactively search for, evaluate and select interesting structural features that can potentially explain RNA functions. Availability and implementation Structurxplor is a web application available at http://structurexplor.dinf.usherbrooke.ca. The source code can be found at http://jpsglouzon.github.io/structurexplor/. Contact shengrui.wang@usherbrooke.ca. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jean-Pierre Séhi Glouzon
- Department of Computer Science, Faculty of Science, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1 Canada.,RNA Group, Department of Biochemistry, Faculty of Medicine and Health Sciences, Applied Cancer Research Pavilion, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1, Canada
| | - Jean-Pierre Perreault
- RNA Group, Department of Biochemistry, Faculty of Medicine and Health Sciences, Applied Cancer Research Pavilion, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1, Canada
| | - Shengrui Wang
- Department of Computer Science, Faculty of Science, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1 Canada
| |
Collapse
|
6
|
Wang C, Schmich F, Srivatsa S, Weidner J, Beerenwinkel N, Spang A. Context-dependent deposition and regulation of mRNAs in P-bodies. eLife 2018; 7:29815. [PMID: 29297464 PMCID: PMC5752201 DOI: 10.7554/elife.29815] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 12/13/2017] [Indexed: 12/21/2022] Open
Abstract
Cells respond to stress by remodeling their transcriptome through transcription and degradation. Xrn1p-dependent degradation in P-bodies is the most prevalent decay pathway, yet, P-bodies may facilitate not only decay, but also act as a storage compartment. However, which and how mRNAs are selected into different degradation pathways and what determines the fate of any given mRNA in P-bodies remain largely unknown. We devised a new method to identify both common and stress-specific mRNA subsets associated with P-bodies. mRNAs targeted for degradation to P-bodies, decayed with different kinetics. Moreover, the localization of a specific set of mRNAs to P-bodies under glucose deprivation was obligatory to prevent decay. Depending on its client mRNA, the RNA-binding protein Puf5p either promoted or inhibited decay. Furthermore, the Puf5p-dependent storage of a subset of mRNAs in P-bodies under glucose starvation may be beneficial with respect to chronological lifespan.
Collapse
Affiliation(s)
- Congwei Wang
- Growth and Development, Biozentrum, University of Basel, Basel, Switzerland
| | - Fabian Schmich
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Sumana Srivatsa
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Julie Weidner
- Growth and Development, Biozentrum, University of Basel, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Anne Spang
- Growth and Development, Biozentrum, University of Basel, Basel, Switzerland
| |
Collapse
|
7
|
Smith MA, Seemann SE, Quek XC, Mattick JS. DotAligner: identification and clustering of RNA structure motifs. Genome Biol 2017; 18:244. [PMID: 29284541 PMCID: PMC5747123 DOI: 10.1186/s13059-017-1371-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 12/05/2017] [Indexed: 01/01/2023] Open
Abstract
The diversity of processed transcripts in eukaryotic genomes poses a challenge for the classification of their biological functions. Sparse sequence conservation in non-coding sequences and the unreliable nature of RNA structure predictions further exacerbate this conundrum. Here, we describe a computational method, DotAligner, for the unsupervised discovery and classification of homologous RNA structure motifs from a set of sequences of interest. Our approach outperforms comparable algorithms at clustering known RNA structure families, both in speed and accuracy. It identifies clusters of known and novel structure motifs from ENCODE immunoprecipitation data for 44 RNA-binding proteins.
Collapse
Affiliation(s)
- Martin A Smith
- RNA Biology and Plasticity Group, Garvan Institute of Medical Research, 384 Victoria Street, Sydney, NSW 2010, Australia. .,St Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW 2010, Australia.
| | - Stefan E Seemann
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Groennegaardsvej 3, Frederiksberg, 1870, Denmark.,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, DK-1870, Frederiksberg, Denmark
| | - Xiu Cheng Quek
- RNA Biology and Plasticity Group, Garvan Institute of Medical Research, 384 Victoria Street, Sydney, NSW 2010, Australia.,St Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW 2010, Australia
| | - John S Mattick
- RNA Biology and Plasticity Group, Garvan Institute of Medical Research, 384 Victoria Street, Sydney, NSW 2010, Australia.,St Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW 2010, Australia
| |
Collapse
|
8
|
Ren C, Liu F, Ouyang Z, An G, Zhao C, Shuai J, Cai S, Bo X, Shu W. Functional annotation of structural ncRNAs within enhancer RNAs in the human genome: implications for human disease. Sci Rep 2017; 7:15518. [PMID: 29138457 PMCID: PMC5686184 DOI: 10.1038/s41598-017-15822-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Accepted: 11/03/2017] [Indexed: 12/28/2022] Open
Abstract
Enhancer RNAs (eRNAs) are a novel class of non-coding RNA (ncRNA) molecules transcribed from the DNA sequences of enhancer regions. Despite extensive efforts devoted to revealing the potential functions and underlying mechanisms of eRNAs, it remains an open question whether eRNAs are mere transcriptional noise or relevant biologically functional species. Here, we identified a catalogue of eRNAs in a broad range of human cell/tissue types and extended our understanding of eRNAs by demonstrating their multi-omic signatures. Gene Ontology (GO) analysis revealed that eRNAs play key roles in human cell identity. Furthermore, we detected numerous known and novel functional RNA structures within eRNA regions. To better characterize the cis-regulatory effects of non-coding variation in these structural ncRNAs, we performed a comprehensive analysis of the genetic variants of structural ncRNAs in eRNA regions that are associated with inflammatory autoimmune diseases. Disease-associated variants of the structural ncRNAs were disproportionately enriched in immune-specific cell types. We also identified riboSNitches in lymphoid eRNAs and investigated the potential pathogenic mechanisms by which eRNAs might function in autoimmune diseases. Collectively, our findings offer valuable insights into the function of eRNAs and suggest that eRNAs might be effective diagnostic and therapeutic targets for human diseases.
Collapse
Affiliation(s)
- Chao Ren
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing, China
| | - Feng Liu
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing, China.,Department of Information, The 188th Hospital of ChaoZhou, ChaoZhou, China
| | - Zhangyi Ouyang
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing, China
| | - Gaole An
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing, China
| | - Chenghui Zhao
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing, China
| | - Jun Shuai
- Department of Information, The 188th Hospital of ChaoZhou, ChaoZhou, China
| | - Shuhong Cai
- Department of Information, The 188th Hospital of ChaoZhou, ChaoZhou, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing, China.
| | - Wenjie Shu
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing, China.
| |
Collapse
|
9
|
Fallmann J, Will S, Engelhardt J, Grüning B, Backofen R, Stadler PF. Recent advances in RNA folding. J Biotechnol 2017; 261:97-104. [DOI: 10.1016/j.jbiotec.2017.07.007] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Revised: 07/02/2017] [Accepted: 07/04/2017] [Indexed: 12/23/2022]
|
10
|
Miladi M, Junge A, Costa F, Seemann SE, Havgaard JH, Gorodkin J, Backofen R. RNAscClust: clustering RNA sequences using structure conservation and graph based motifs. Bioinformatics 2017; 33:2089-2096. [PMID: 28334186 PMCID: PMC5870858 DOI: 10.1093/bioinformatics/btx114] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 12/22/2016] [Accepted: 02/23/2017] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account. RESULTS Here, we present RNAscClust , the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features. We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments. AVAILABILITY AND IMPLEMENTATION RNAscClust is available at http://www.bioinf.uni-freiburg.de/Software/RNAscClust . CONTACT gorodkin@rth.dk or backofen@informatik.uni-freiburg.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Milad Miladi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Alexander Junge
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Fabrizio Costa
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Stefan E Seemann
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Jakob Hull Havgaard
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
- Center for Biological Signalling Studies (BIOSS), Cluster of Excellence, University of Freiburg, Freiburg im Breisgau, Germany
| |
Collapse
|
11
|
Abstract
The efficiency of codon translation in vivo is controlled by many factors, including codon context. At a site early in the Salmonella flgM gene, the effects on translation of replacing codons Thr6 and Pro8 of flgM with synonymous alternates produced a 600-fold range in FlgM activity. Synonymous changes at Thr6 and Leu9 resulted in a twofold range in FlgM activity. The level of FlgM activity produced by any codon arrangement was directly proportional to the degree of in vivo ribosome stalling at synonymous codons. Synonymous codon suppressors that corrected the effect of a translation-defective synonymous flgM allele were restricted to two codons flanking the translation-defective codon. The various codon arrangements had no apparent effects on flgM mRNA stability or predicted mRNA secondary structures. Our data suggest that efficient mRNA translation is determined by a triplet-of-triplet genetic code. That is, the efficiency of translating a particular codon is influenced by the nature of the immediately adjacent flanking codons. A model explains these codon-context effects by suggesting that codon recognition by elongation factor-bound aminoacyl-tRNA is initiated by hydrogen bond interactions between the first two nucleotides of the codon and anticodon and then is stabilized by base-stacking energy over three successive codons.
Collapse
|
12
|
Middleton SA, Illuminati J, Kim J. Complete fold annotation of the human proteome using a novel structural feature space. Sci Rep 2017; 7:46321. [PMID: 28406174 PMCID: PMC5390313 DOI: 10.1038/srep46321] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2017] [Accepted: 03/14/2017] [Indexed: 11/11/2022] Open
Abstract
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.
Collapse
Affiliation(s)
- Sarah A Middleton
- Genomics and Computational Biology Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Joseph Illuminati
- Department of Computer Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Junhyong Kim
- Genomics and Computational Biology Program, University of Pennsylvania, Philadelphia, PA 19104, USA.,Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
13
|
Abstract
Protein-coding RNAs represent only a small fraction of the transcriptional output in higher eukaryotes. The remaining RNA species encompass a broad range of molecular functions and regulatory roles, a consequence of the structural polyvalence of RNA polymers. Albeit several classes of small noncoding RNAs are relatively well characterized, the accessibility of affordable high-throughput sequencing is generating a wealth of novel, unannotated transcripts, especially long noncoding RNAs (lncRNAs) that are derived from genomic regions that are antisense, intronic, intergenic, and overlapping protein-coding loci. Parsing and characterizing the functions of noncoding RNAs-lncRNAs in particular-is one of the great challenges of modern genome biology. Here we discuss concepts and computational methods for the identification of structural domains in lncRNAs from genomic and transcriptomic data. In the first part, we briefly review how to identify RNA structural motifs in individual lncRNAs. In the second part, we describe how to leverage the evolutionary dynamics of structured RNAs in a computationally efficient screen to detect putative functional lncRNA motifs using comparative genomics.
Collapse
Affiliation(s)
- Martin A Smith
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst, NSW, 2010, Australia. .,St-Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW, 2052, Australia.
| | - John S Mattick
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst, NSW, 2010, Australia.,St-Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW, 2052, Australia
| |
Collapse
|