1
|
Michelini F, Jalihal AP, Francia S, Meers C, Neeb ZT, Rossiello F, Gioia U, Aguado J, Jones-Weinert C, Luke B, Biamonti G, Nowacki M, Storici F, Carninci P, Walter NG, d'Adda di Fagagna F. From "Cellular" RNA to "Smart" RNA: Multiple Roles of RNA in Genome Stability and Beyond. Chem Rev 2018; 118:4365-4403. [PMID: 29600857 DOI: 10.1021/acs.chemrev.7b00487] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Coding for proteins has been considered the main function of RNA since the "central dogma" of biology was proposed. The discovery of noncoding transcripts shed light on additional roles of RNA, ranging from the support of polypeptide synthesis, to the assembly of subnuclear structures, to gene expression modulation. Cellular RNA has therefore been recognized as a central player in often unanticipated biological processes, including genomic stability. This ever-expanding list of functions inspired us to think of RNA as a "smart" phone, which has replaced the older obsolete "cellular" phone. In this review, we summarize the last two decades of advances in research on the interface between RNA biology and genome stability. We start with an account of the emergence of noncoding RNA, and then we discuss the involvement of RNA in DNA damage signaling and repair, telomere maintenance, and genomic rearrangements. We continue with the depiction of single-molecule RNA detection techniques, and we conclude by illustrating the possibilities of RNA modulation in hopes of creating or improving new therapies. The widespread biological functions of RNA have made this molecule a reoccurring theme in basic and translational research, warranting it the transcendence from classically studied "cellular" RNA to "smart" RNA.
Collapse
Affiliation(s)
- Flavia Michelini
- IFOM - The FIRC Institute of Molecular Oncology , Milan , 20139 , Italy
| | - Ameya P Jalihal
- Single Molecule Analysis Group and Center for RNA Biomedicine, Department of Chemistry , University of Michigan , Ann Arbor , Michigan 48109-1055 , United States
| | - Sofia Francia
- IFOM - The FIRC Institute of Molecular Oncology , Milan , 20139 , Italy.,Istituto di Genetica Molecolare , CNR - Consiglio Nazionale delle Ricerche , Pavia , 27100 , Italy
| | - Chance Meers
- School of Biological Sciences , Georgia Institute of Technology , Atlanta , Georgia 30332 , United States
| | - Zachary T Neeb
- Institute of Cell Biology , University of Bern , Baltzerstrasse 4 , 3012 Bern , Switzerland
| | | | - Ubaldo Gioia
- IFOM - The FIRC Institute of Molecular Oncology , Milan , 20139 , Italy
| | - Julio Aguado
- IFOM - The FIRC Institute of Molecular Oncology , Milan , 20139 , Italy
| | | | - Brian Luke
- Institute of Developmental Biology and Neurobiology , Johannes Gutenberg University , 55099 Mainz , Germany.,Institute of Molecular Biology (IMB) , 55128 Mainz , Germany
| | - Giuseppe Biamonti
- Istituto di Genetica Molecolare , CNR - Consiglio Nazionale delle Ricerche , Pavia , 27100 , Italy
| | - Mariusz Nowacki
- Institute of Cell Biology , University of Bern , Baltzerstrasse 4 , 3012 Bern , Switzerland
| | - Francesca Storici
- School of Biological Sciences , Georgia Institute of Technology , Atlanta , Georgia 30332 , United States
| | - Piero Carninci
- RIKEN Center for Life Science Technologies , 1-7-22 Suehiro-cho, Tsurumi-ku , Yokohama City , Kanagawa 230-0045 , Japan
| | - Nils G Walter
- Single Molecule Analysis Group and Center for RNA Biomedicine, Department of Chemistry , University of Michigan , Ann Arbor , Michigan 48109-1055 , United States
| | - Fabrizio d'Adda di Fagagna
- IFOM - The FIRC Institute of Molecular Oncology , Milan , 20139 , Italy.,Istituto di Genetica Molecolare , CNR - Consiglio Nazionale delle Ricerche , Pavia , 27100 , Italy
| |
Collapse
|
2
|
Brown T, Howe FS, Murray SC, Wouters M, Lorenz P, Seward E, Rata S, Angel A, Mellor J. Antisense transcription-dependent chromatin signature modulates sense transcript dynamics. Mol Syst Biol 2018; 14:e8007. [PMID: 29440389 PMCID: PMC5810148 DOI: 10.15252/msb.20178007] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 01/13/2018] [Accepted: 01/16/2018] [Indexed: 12/22/2022] Open
Abstract
Antisense transcription is widespread in genomes. Despite large differences in gene size and architecture, we find that yeast and human genes share a unique, antisense transcription-associated chromatin signature. We asked whether this signature is related to a biological function for antisense transcription. Using quantitative RNA-FISH, we observed changes in sense transcript distributions in nuclei and cytoplasm as antisense transcript levels were altered. To determine the mechanistic differences underlying these distributions, we developed a mathematical framework describing transcription from initiation to transcript degradation. At GAL1, high levels of antisense transcription alter sense transcription dynamics, reducing rates of transcript production and processing, while increasing transcript stability. This relationship with transcript stability is also observed as a genome-wide association. Establishing the antisense transcription-associated chromatin signature through disruption of the Set3C histone deacetylase activity is sufficient to similarly change these rates even in the absence of antisense transcription. Thus, antisense transcription alters sense transcription dynamics in a chromatin-dependent manner.
Collapse
Affiliation(s)
- Thomas Brown
- Department of Biochemistry, University of Oxford, Oxford, UK
| | | | - Struan C Murray
- Department of Biochemistry, University of Oxford, Oxford, UK
| | | | - Philipp Lorenz
- Department of Biochemistry, University of Oxford, Oxford, UK
| | - Emily Seward
- Department of Biochemistry, University of Oxford, Oxford, UK
| | - Scott Rata
- Department of Biochemistry, University of Oxford, Oxford, UK
| | - Andrew Angel
- Department of Biochemistry, University of Oxford, Oxford, UK
| | - Jane Mellor
- Department of Biochemistry, University of Oxford, Oxford, UK
| |
Collapse
|
3
|
Dinka H, Le MT. Analysis of Pig Vomeronasal Receptor Type 1 (V1R) Promoter Region Reveals a Common Promoter Motif but Poor CpG Islands. Anim Biotechnol 2017; 29:293-300. [PMID: 29120694 DOI: 10.1080/10495398.2017.1383915] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Promoters are, generally, located immediately upstream of a transcription start site (TSS) and have a variety of regulatory motifs, such as transcription factors (TFs) and CpG islands (CGIs), that participate in the regulation of gene expression. Here analysis of the promoter region for pig vomeronasal receptor type 1 (V1R) was described. In the analysis, TSSs for pig V1R genes was first identified and five motifs (MV1, MV2, MV3, MV4, and MV5) were found that are shared by at least 50% of the pig V1R promoter input sequences from both strands. Among the five motifs, MV2 was identified as a common promoter motif shared by all (100%) pig V1R promoters. For further analysis, to better characterize and get deeper biological insight associated with MV2, TOMTOM web application was used. MV2 was compared to the known motif databases (such as JASPAR) to see if they are similar to a known regulatory motif (transcription factor). Hence, it was revealed that MV2 serves as the binding site mainly for the BetaBetaAlpha-zinc finger (BTB-ZF) transcription factor gene family to regulate expression of pig V1R genes. Moreover, it was shown that pig V1R promoters are CpG poor, suggesting that their gene expression regulation pattern is in tissue specific manner.
Collapse
Affiliation(s)
- Hunduma Dinka
- a Department of Applied Biology, School of Applied Natural Sciences , Adama Science and Technology University , Adama , Ethiopia.,b Department of Animal Biotechnology , Konkuk University , Seoul , South Korea
| | - Minh Thong Le
- b Department of Animal Biotechnology , Konkuk University , Seoul , South Korea
| |
Collapse
|
4
|
Jalali S, Singh A, Maiti S, Scaria V. Genome-wide computational analysis of potential long noncoding RNA mediated DNA:DNA:RNA triplexes in the human genome. J Transl Med 2017; 15:186. [PMID: 28865451 PMCID: PMC7670996 DOI: 10.1186/s12967-017-1282-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 08/18/2017] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Only a handful of long noncoding RNAs have been functionally characterized. They are known to modulate regulation through interacting with other biomolecules in the cell: DNA, RNA and protein. Though there have been detailed investigations on lncRNA-miRNA and lncRNA-protein interactions, the interaction of lncRNAs with DNA have not been studied extensively. In the present study, we explore whether lncRNAs could modulate genomic regulation by interacting with DNA through the formation of highly stable DNA:DNA:RNA triplexes. METHODS We computationally screened 23,898 lncRNA transcripts as annotated by GENCODE, across the human genome for potential triplex forming sequence stretches (PTS). The PTS frequencies were compared across 5'UTR, CDS, 3'UTR, introns, promoter and 1000 bases downstream of the transcription termination sites. These regions were annotated by mapping to experimental regulatory regions, classes of repeat regions and transcription factors. We validated few putative triplex mediated interactions where lncRNA-gene pair interaction is via pyrimidine triplex motif using biophysical methods. RESULTS We identified 20,04,034 PTS sites to be enriched in promoter and intronic regions across human genome. Additional analysis of the association of PTS with core promoter elements revealed a systematic paucity of PTS in all regulatory regions, except TF binding sites. A total of 25 transcription factors were found to be associated with PTS. Using an interaction network, we showed that a subset of the triplex forming lncRNAs, have a positive association with gene promoters. We also demonstrated an in vitro interaction of one lncRNA candidate with its predicted gene target promoter regions. CONCLUSIONS Our analysis shows that PTS are enriched in gene promoter and largely associated with simple repeats. The current study suggests a major role of a subset of lncRNAs in mediating chromatin organization modulation through CTCF and NSRF proteins.
Collapse
Affiliation(s)
- Saakshi Jalali
- CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, Delhi, 110020 India
- Academy of Scientific and Innovative Research (AcSIR), CSIR IGIB South Campus, Mathura Road, Delhi, 110020 India
| | - Amrita Singh
- CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, Delhi, 110020 India
- Academy of Scientific and Innovative Research (AcSIR), CSIR IGIB South Campus, Mathura Road, Delhi, 110020 India
| | - Souvik Maiti
- CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, Delhi, 110020 India
| | - Vinod Scaria
- CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, Delhi, 110020 India
- Academy of Scientific and Innovative Research (AcSIR), CSIR IGIB South Campus, Mathura Road, Delhi, 110020 India
| |
Collapse
|
5
|
Abstract
This paper presents a history of the changing meanings of the term "gene," over more than a century, and a discussion of why this word, so crucial to genetics, needs redefinition today. In this account, the first two phases of 20th century genetics are designated the "classical" and the "neoclassical" periods, and the current molecular-genetic era the "modern period." While the first two stages generated increasing clarity about the nature of the gene, the present period features complexity and confusion. Initially, the term "gene" was coined to denote an abstract "unit of inheritance," to which no specific material attributes were assigned. As the classical and neoclassical periods unfolded, the term became more concrete, first as a dimensionless point on a chromosome, then as a linear segment within a chromosome, and finally as a linear segment in the DNA molecule that encodes a polypeptide chain. This last definition, from the early 1960s, remains the one employed today, but developments since the 1970s have undermined its generality. Indeed, they raise questions about both the utility of the concept of a basic "unit of inheritance" and the long implicit belief that genes are autonomous agents. Here, we review findings that have made the classic molecular definition obsolete and propose a new one based on contemporary knowledge.
Collapse
Affiliation(s)
- Petter Portin
- Laboratory of Genetics, Department of Biology, University of Turku, 20014, Finland
| | - Adam Wilkins
- Institute of Theoretical Biology, Humboldt Universität zu Berlin, 10115, Germany
| |
Collapse
|
6
|
Identification of Novel Short C-Terminal Transcripts of Human SERPINA1 Gene. PLoS One 2017; 12:e0170533. [PMID: 28107454 PMCID: PMC5249162 DOI: 10.1371/journal.pone.0170533] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 01/05/2017] [Indexed: 12/22/2022] Open
Abstract
Human SERPINA1 gene is located on chromosome 14q31-32.3 and is organized into three (IA, IB, and IC) non-coding and four (II, III, IV, V) coding exons. This gene produces α1-antitrypsin (A1AT), a prototypical member of the serpin superfamily of proteins. We demonstrate that human peripheral blood leukocytes express not only a product corresponding to the transcript coding for the full-length A1AT protein but also two short transcripts (ST1C4 and ST1C5) of A1AT. In silico sequence analysis revealed that the last exon of the short transcripts contains an Open Reading Frame (ORF) and thus putatively can produce peptides. We found ST1C4 expression across different human tissues whereas ST1C5 was mainly restricted to leukocytes, specifically neutrophils. A high up-regulation (10-fold) of short transcripts was observed in isolated human blood neutrophils after activation with lipopolysaccharide. Parallel analyses by liquid chromatography-mass spectrometry identified peptides corresponding to C-terminal region of A1AT in supernatants of activated but not naïve neutrophils. Herein we report for the first time a tissue specific expression and regulation of short transcripts of SERPINA1 gene, and the presence of C-terminal peptides in supernatants from activated neutrophils, in vitro. This gives a novel insight into the studies on the transcription of SERPINA1 gene.
Collapse
|
7
|
Mwangi S, Attardo G, Suzuki Y, Aksoy S, Christoffels A. TSS seq based core promoter architecture in blood feeding Tsetse fly (Glossina morsitans morsitans) vector of Trypanosomiasis. BMC Genomics 2015; 16:722. [PMID: 26394619 PMCID: PMC4578606 DOI: 10.1186/s12864-015-1921-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Accepted: 09/11/2015] [Indexed: 02/02/2023] Open
Abstract
Background Transcription initiation regulation is mediated by sequence-specific interactions between DNA-binding proteins (transcription factors) and cis-elements, where BRE, TATA, INR, DPE and MTE motifs constitute canonical core motifs for basal transcription initiation of genes. Accurate identification of transcription start site (TSS) and their corresponding promoter regions is critical for delineation of these motifs. To this end, the genome scale analysis of core promoter architecture in insects has been confined to Drosophila. The recently sequenced Tsetse fly genome provides a unique opportunity to analyze transcription initiation regulation machinery in blood-feeding insects. Results A computational method for identification of TSS in newly sequenced Tsetse fly genome was evaluated, using TSS seq tags sampled from two developmental stages namely; larvae and pupae. There were 3134 tag clusters among which 45.4 % (1424) of the tag clusters mapped to first coding exons or their proximal predicted 5′UTR regions and 1.0 % (31) tag clusters mapping to transposons, within a threshold of 100 tags per cluster. These 1393 non transposon-derived core promoters had propensity for AT nucleotides. The −1/+1 and 1/+1 positions in D. melanogaster, and G. m. morsitans had propensity for CA and AA dinucleotides respectively. The 1393 tag clusters comprised narrow promoters (5 %), broad with peak promoters (23 %) and broad without peak promoters (72 %). Two-way motif co-occurrence analysis showed that the MTE-DPE pair is over-represented in broad core promoters. The frequently occurring triplet motifs in all promoter classes are the INR-MTE-DPE, TATA-MTE-DPE and TATA-INR-DPE. Promoters without the TATA motif had higher frequency of the MTE and INR motifs than those observed in Drosophila, where the DPE motif occur more frequently in promoters without TATA motif. Gene ontology terms associated with developmental processes were overrepresented in the narrow and broad with peak promoters. Conclusions The study has identified different motif combinations associated with broad promoters in a blood-feeding insect. In the case of TATA-less core promoters, G.m. morsitans uses the MTE to compensate for the lack of a TATA motif. The increasing availability of TSS seq data allows for revision of existing gene annotation datasets with the potential of identifying new transcriptional units. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1921-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sarah Mwangi
- South African MRC Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
| | - Geoffrey Attardo
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, 06510, USA.
| | - Yutaka Suzuki
- Department of Medical Genome Sciences, University of Tokyo, Tokyo, Japan.
| | - Serap Aksoy
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, 06510, USA.
| | - Alan Christoffels
- South African MRC Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
| |
Collapse
|
8
|
Kruse H, Mladek A, Gkionis K, Hansen A, Grimme S, Sponer J. Quantum chemical benchmark study on 46 RNA backbone families using a dinucleotide unit. J Chem Theory Comput 2015; 11:4972-91. [PMID: 26574283 DOI: 10.1021/acs.jctc.5b00515] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
We have created a benchmark set of quantum chemical structure-energy data denoted as UpU46, which consists of 46 uracil dinucleotides (UpU), representing all known 46 RNA backbone conformational families. Penalty-function-based restrained optimizations with COSMO TPSS-D3/def2-TZVP ensure a balance between keeping the target conformation and geometry relaxation. The backbone geometries are close to the clustering-means of their respective RNA bioinformatics family classification. High-level wave function methods (DLPNO-CCSD(T) as reference) and a wide-range of dispersion-corrected or inclusive DFT methods (DFT-D3, VV10, LC-BOP-LRD, M06-2X, M11, and more) are used to evaluate the conformational energies. The results are compared to the Amber RNA bsc0χOL3 force field. Most dispersion-corrected DFT methods surpass the Amber force field significantly in accuracy and yield mean absolute deviations (MADs) for relative conformational energies of ∼0.4-0.6 kcal/mol. Double-hybrid density functionals represent the most accurate class of density functionals. Low-cost quantum chemical methods such as PM6-D3H+, HF-3c, DFTB3-D3, as well as small basis set calculations corrected for basis set superposition errors (BSSEs) by the gCP procedure are also tested. Unfortunately, the presently available low-cost methods are struggling to describe the UpU conformational energies with satisfactory accuracy. The UpU46 benchmark is an ideal test for benchmarking and development of fast methods to describe nucleic acids, including force fields.
Collapse
Affiliation(s)
- Holger Kruse
- Institute of Biophysics, Academy of Sciences of the Czech Republic , Královopolská 135, 612 65 Brno, Czech Republic.,CEITEC-Central European Institute of Technology, Campus Bohunice, Kamenice 5, 625 00 Brno, Czech Republic
| | - Arnost Mladek
- Institute of Biophysics, Academy of Sciences of the Czech Republic , Královopolská 135, 612 65 Brno, Czech Republic
| | - Konstantinos Gkionis
- Institute of Biophysics, Academy of Sciences of the Czech Republic , Královopolská 135, 612 65 Brno, Czech Republic.,CEITEC-Central European Institute of Technology, Campus Bohunice, Kamenice 5, 625 00 Brno, Czech Republic
| | - Andreas Hansen
- Mulliken Center for Theoretical Chemistry, Institut für Physikalische und Theoretische Chemie der Universität Bonn , Beringstr. 4, D-53115 Bonn, Germany
| | - Stefan Grimme
- Mulliken Center for Theoretical Chemistry, Institut für Physikalische und Theoretische Chemie der Universität Bonn , Beringstr. 4, D-53115 Bonn, Germany
| | - Jiri Sponer
- Institute of Biophysics, Academy of Sciences of the Czech Republic , Královopolská 135, 612 65 Brno, Czech Republic.,CEITEC-Central European Institute of Technology, Campus Bohunice, Kamenice 5, 625 00 Brno, Czech Republic
| |
Collapse
|
9
|
Khamis AM, Hamilton AR, Medvedeva YA, Alam T, Alam I, Essack M, Umylny B, Jankovic BR, Naeger NL, Suzuki M, Harbers M, Robinson GE, Bajic VB. Insights into the Transcriptional Architecture of Behavioral Plasticity in the Honey Bee Apis mellifera. Sci Rep 2015; 5:11136. [PMID: 26073445 PMCID: PMC4466890 DOI: 10.1038/srep11136] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Accepted: 05/01/2015] [Indexed: 12/30/2022] Open
Abstract
Honey bee colonies exhibit an age-related division of labor, with worker bees performing discrete sets of behaviors throughout their lifespan. These behavioral states are associated with distinct brain transcriptomic states, yet little is known about the regulatory mechanisms governing them. We used CAGEscan (a variant of the Cap Analysis of Gene Expression technique) for the first time to characterize the promoter regions of differentially expressed brain genes during two behavioral states (brood care (aka “nursing”) and foraging) and identified transcription factors (TFs) that may govern their expression. More than half of the differentially expressed TFs were associated with motifs enriched in the promoter regions of differentially expressed genes (DEGs), suggesting they are regulators of behavioral state. Strikingly, five TFs (nf-kb, egr, pax6, hairy, and clockwork orange) were predicted to co-regulate nearly half of the genes that were upregulated in foragers. Finally, differences in alternative TSS usage between nurses and foragers were detected upstream of 646 genes, whose functional analysis revealed enrichment for Gene Ontology terms associated with neural function and plasticity. This demonstrates for the first time that alternative TSSs are associated with stable differences in behavior, suggesting they may play a role in organizing behavioral state.
Collapse
Affiliation(s)
- Abdullah M Khamis
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Adam R Hamilton
- Departments of Entomology and Institute for Genomic Biology, Urbana, IL 61801; and Neuroscience Program, University of Illinois at Urbana-Champaign, Urbana, IL 61801
| | - Yulia A Medvedeva
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Tanvir Alam
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Intikhab Alam
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Magbubah Essack
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Boris Umylny
- Lumenogix Inc., 2935 Rodeo Park Drive East, Santa Fe NM, 87505, USA
| | - Boris R Jankovic
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Nicholas L Naeger
- Departments of Entomology and Institute for Genomic Biology, Urbana, IL 61801; and Neuroscience Program, University of Illinois at Urbana-Champaign, Urbana, IL 61801
| | - Makoto Suzuki
- DNAFORM Inc., Leading Venture Plaza-2, 75-1, Ono-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0046, Japan
| | - Matthias Harbers
- 1] DNAFORM Inc., Leading Venture Plaza-2, 75-1, Ono-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0046, Japan [2] RIKEN Center for Life Science Technologies, Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
| | - Gene E Robinson
- Departments of Entomology and Institute for Genomic Biology, Urbana, IL 61801; and Neuroscience Program, University of Illinois at Urbana-Champaign, Urbana, IL 61801
| | - Vladimir B Bajic
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
10
|
Roos-Araujo D, Stuart S, Lea RA, Haupt LM, Griffiths LR. Epigenetics and migraine; complex mitochondrial interactions contributing to disease susceptibility. Gene 2014; 543:1-7. [PMID: 24704026 DOI: 10.1016/j.gene.2014.04.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2014] [Accepted: 04/01/2014] [Indexed: 02/08/2023]
Abstract
Migraine is a common neurological disorder classified by the World Health Organisation (WHO) as one of the top twenty most debilitating diseases in the developed world. Current therapies are only effective for a proportion of sufferers and new therapeutic targets are desperately needed to alleviate this burden. Recently the role of epigenetics in the development of many complex diseases including migraine has become an emerging topic. By understanding the importance of acetylation, methylation and other epigenetic modifications, it then follows that this modification process is a potential target to manipulate epigenetic status with the goal of treating disease. Bisulphite sequencing and methylated DNA immunoprecipitation have been used to demonstrate the presence of methylated cytosines in the human D-loop of mitochondrial DNA (mtDNA), proving that the mitochondrial genome is methylated. For the first time, it has been shown that there is a difference in mtDNA epigenetic status between healthy controls and those with disease, especially for neurodegenerative and age related conditions. Given co-morbidities with migraine and the suggestive link between mitochondrial dysfunction and the lowered threshold for triggering a migraine attack, mitochondrial methylation may be a new avenue to pursue. Creative thinking and new approaches are needed to solve complex problems and a systems biology approach, where multiple layers of information are integrated is becoming more important in complex disease modelling.
Collapse
Affiliation(s)
- Deidré Roos-Araujo
- Genomics Research Centre, Institute for Biomedical Health and Innovation, Queensland University of Technology, Brisbane, Queensland 4059, Australia
| | - Shani Stuart
- Genomics Research Centre, Institute for Biomedical Health and Innovation, Queensland University of Technology, Brisbane, Queensland 4059, Australia
| | - Rod A Lea
- Genomics Research Centre, Institute for Biomedical Health and Innovation, Queensland University of Technology, Brisbane, Queensland 4059, Australia
| | - Larisa M Haupt
- Genomics Research Centre, Institute for Biomedical Health and Innovation, Queensland University of Technology, Brisbane, Queensland 4059, Australia
| | - Lyn R Griffiths
- Genomics Research Centre, Institute for Biomedical Health and Innovation, Queensland University of Technology, Brisbane, Queensland 4059, Australia.
| |
Collapse
|
11
|
Zhang Q, Li H, Jin H, Tan H, Zhang J, Sheng S. The global landscape of intron retentions in lung adenocarcinoma. BMC Med Genomics 2014; 7:15. [PMID: 24646369 PMCID: PMC3999986 DOI: 10.1186/1755-8794-7-15] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 03/14/2014] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND The transcriptome complexity in an organism can be achieved by alternative splicing of precursor messenger RNAs. It has been revealed that alternations in mRNA splicing play an important role in a number of diseases including human cancers. METHODS In this study, we exploited whole transcriptome sequencing data from five lung adenocarcinoma tissues and their matched normal tissues to interrogate intron retention, a less studied alternative splicing form which has profound structural and functional consequence by modifying open reading frame or inserting premature stop codons. RESULTS Abundant intron retention events were found in both tumor and normal tissues, and 2,340 and 1,422 genes only contain tumor-specific retentions and normal-specific retentions, respectively. Combined with gene expression analysis, we showed that genes with tumor-specific retentions tend to be over-expressed in tumors, and the abundance of intron retention within genes is negatively related with gene expression, indicating the action of nonsense mediated decay. Further functional analysis demonstrated that genes with tumor-specific retentions include known lung cancer driver genes and are found enriched in pathways important in carcinogenesis. CONCLUSIONS We hypothesize that intron retentions and consequent nonsense mediated decay may collectively counteract the over-expression of genes promoting cancer development. Identification of genes with tumor-specific retentions may also help develop targeted therapies.
Collapse
Affiliation(s)
- Qu Zhang
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Hua Li
- HYK High-throughput Biotechnology Institute, 4/F, Building #11, Software Park, 2nd Central Keji Rd, Hi-Tech Industrial Park, Shenzhen 518060, China
| | - Hong Jin
- HYK High-throughput Biotechnology Institute, 4/F, Building #11, Software Park, 2nd Central Keji Rd, Hi-Tech Industrial Park, Shenzhen 518060, China
| | - Huibiao Tan
- HYK High-throughput Biotechnology Institute, 4/F, Building #11, Software Park, 2nd Central Keji Rd, Hi-Tech Industrial Park, Shenzhen 518060, China
| | - Jun Zhang
- Department of Surgery, Shanghai Institute of Digestive Surgery, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, No.197 Ruijin 2nd Road, Shanghai 200025, China
| | - Sitong Sheng
- HYK High-throughput Biotechnology Institute, 4/F, Building #11, Software Park, 2nd Central Keji Rd, Hi-Tech Industrial Park, Shenzhen 518060, China
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou Higher Education Mega Center, Guangzhou 510006, China
- College of Life Sciences, Shenzhen University, Shenzhen 518060, China
| |
Collapse
|
12
|
Kumari S, Ware D. Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots. PLoS One 2013; 8:e79011. [PMID: 24205361 PMCID: PMC3812177 DOI: 10.1371/journal.pone.0079011] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Accepted: 09/18/2013] [Indexed: 01/22/2023] Open
Abstract
Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the computational prediction of CPEs across eight plant genomes to help better understand the transcription initiation complex assembly. The distribution of thirteen known CPEs across four monocots (Brachypodium distachyon, Oryza sativa ssp. japonica, Sorghum bicolor, Zea mays) and four dicots (Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Glycine max) reveals the structural organization of the core promoter in relation to the TATA-box as well as with respect to other CPEs. The distribution of known CPE motifs with respect to transcription start site (TSS) exhibited positional conservation within monocots and dicots with slight differences across all eight genomes. Further, a more refined subset of annotated genes based on orthologs of the model monocot (O. sativa ssp. japonica) and dicot (A. thaliana) genomes supported the positional distribution of these thirteen known CPEs. DNA free energy profiles provided evidence that the structural properties of promoter regions are distinctly different from that of the non-regulatory genome sequence. It also showed that monocot core promoters have lower DNA free energy than dicot core promoters. The comparison of monocot and dicot promoter sequences highlights both the similarities and differences in the core promoter architecture irrespective of the species-specific nucleotide bias. This study will be useful for future work related to genome annotation projects and can inspire research efforts aimed to better understand regulatory mechanisms of transcription.
Collapse
Affiliation(s)
- Sunita Kumari
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America,
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America,
- United States Department of Agriculture-Agriculture Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, New York, United States of America
| |
Collapse
|
13
|
Eswaran J, Horvath A, Godbole S, Reddy SD, Mudvari P, Ohshiro K, Cyanam D, Nair S, Fuqua SAW, Polyak K, Florea LD, Kumar R. RNA sequencing of cancer reveals novel splicing alterations. Sci Rep 2013; 3:1689. [PMID: 23604310 PMCID: PMC3631769 DOI: 10.1038/srep01689] [Citation(s) in RCA: 134] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2013] [Accepted: 03/01/2013] [Indexed: 12/30/2022] Open
Abstract
Breast cancer transcriptome acquires a myriad of regulation changes, and splicing is critical for the cell to “tailor-make” specific functional transcripts. We systematically revealed splicing signatures of the three most common types of breast tumors using RNA sequencing: TNBC, non-TNBC and HER2-positive breast cancer. We discovered subtype specific differentially spliced genes and splice isoforms not previously recognized in human transcriptome. Further, we showed that exon skip and intron retention are predominant splice events in breast cancer. In addition, we found that differential expression of primary transcripts and promoter switching are significantly deregulated in breast cancer compared to normal breast. We validated the presence of novel hybrid isoforms of critical molecules like CDK4, LARP1, ADD3, and PHLPP2. Our study provides the first comprehensive portrait of transcriptional and splicing signatures specific to breast cancer sub-types, as well as previously unknown transcripts that prompt the need for complete annotation of tissue and disease specific transcriptome.
Collapse
Affiliation(s)
- Jeyanthy Eswaran
- McCormick Genomic and Proteomics Center, The George Washington University, Washington, District of Columbia 20037, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
O’Hara SP, Tabibian JH, Splinter PL, LaRusso NF. The dynamic biliary epithelia: molecules, pathways, and disease. J Hepatol 2013; 58:575-582. [PMID: 23085249 PMCID: PMC3831345 DOI: 10.1016/j.jhep.2012.10.011] [Citation(s) in RCA: 119] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/24/2012] [Revised: 10/01/2012] [Accepted: 10/10/2012] [Indexed: 02/08/2023]
Abstract
Cholangiocytes, the cells lining bile ducts, are a heterogenous, highly dynamic population of epithelial cells. While these cells comprise a small fraction of the total cellular component of the liver, they perform the essential role of bile modification and transport of biliary and blood constituents. From a pathophysiological standpoint, cholangiocytes are the target of a diverse group of biliary disorders, collectively referred to as the cholangiopathies. To date, the cause of most cholangiopathies remains obscure. It is known, however, that cholangiocytes exist in an environment rich in potential mediators of cellular injury, express receptors that recognize potential injurious insults, and participate in portal tract repair processes following hepatic injury. As such, cholangiocytes may not be only a passive target, but are likely directly and actively involved in the pathogenesis of cholangiopathies. Here, we briefly summarize the characteristics of the reactive cholangiocyte and cholangiocyte responses to potentially injurious endogenous and exogenous molecules, and in addition, present emerging concepts in our understanding of the etiopathogenesis of several cholangiopathies.
Collapse
Affiliation(s)
- Steven P. O’Hara
- Department of Gastroenterology and Hepatology and the Mayo Clinic Center for Cell Signaling in Gastroenterology, Mayo Clinic, Rochester, MN, United States
| | - James H. Tabibian
- Department of Gastroenterology and Hepatology and the Mayo Clinic Center for Cell Signaling in Gastroenterology, Mayo Clinic, Rochester, MN, United States
| | - Patrick L. Splinter
- Department of Gastroenterology and Hepatology and the Mayo Clinic Center for Cell Signaling in Gastroenterology, Mayo Clinic, Rochester, MN, United States
| | - Nicholas F. LaRusso
- Department of Gastroenterology and Hepatology and the Mayo Clinic Center for Cell Signaling in Gastroenterology, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
15
|
Freeman LA. Cloning full-length transcripts and transcript variants using 5' and 3' RACE. Methods Mol Biol 2013; 1027:3-17. [PMID: 23912980 DOI: 10.1007/978-1-60327-369-5_1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Gene transcripts and transcript variants must be cloned to characterize gene function and regulation. However, obtaining full-length cDNAs with accurate sequences from the 5' end through to the 3' end can be challenging. Here we describe a reverse-transcriptase-based method for obtaining full-length cDNAs using the SMARTer ("Switching Mechanism At RNA Termini") RACE technology developed by Clontech. RNA is isolated from the tissue of interest and annealed to a primer (a modified oligo(dT) primer for polyA+ transcripts; random hexamers or a gene-specific primer for polyA- transcripts). A modified MMLV-reverse transcriptase uses the primer to initiate cDNA synthesis from RNA transcript(s) annealed to the primer and continues cDNA synthesis (reverse transcription) towards the 5' end of the transcript(s). Importantly, this reverse transcriptase possesses terminal transferase activity, so when it reaches the 5' end of a transcript it adds a 3-5 residue "tail" to the newly synthesized cDNA strand. Included in the reverse transcriptase reaction mix is an oligonucleotide containing a sequence tag as well as a terminal series of modified bases that anneal to the 3-5 residue tail on the newly synthesized cDNA. The reverse transcriptase proceeds from the end of the transcript onwards into the modified bases and the rest of the sequence-tagged oligo. The newly synthesized cDNA now has a sequence tag attached to it and can be used as a template for PCR, with one primer complementary to the sequence tag and the second primer specific to the gene of interest. The fragment can be cloned and sequenced or just sequenced directly. If high-quality, undegraded RNA is used, obtaining the true 5' end of a transcript is greatly enhanced. In combination with 3' RACE, full-length transcripts are easily cloned. This method provides sequence information on important regulatory regions, such as 5' and 3' UTRs and flanking regions, and is ideal for detecting transcript variants, including those with alternative transcriptional start sites, alternative splicing, and/or alternative polyadenylation.
Collapse
Affiliation(s)
- Lita A Freeman
- Cardiovascular & Pulmonary Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
16
|
Identification and comparative analysis of ncRNAs in human, mouse and zebrafish indicate a conserved role in regulation of genes expressed in brain. PLoS One 2012; 7:e52275. [PMID: 23284966 PMCID: PMC3527520 DOI: 10.1371/journal.pone.0052275] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2012] [Accepted: 11/12/2012] [Indexed: 12/20/2022] Open
Abstract
ncRNAs (non-coding RNAs), in particular long ncRNAs, represent a significant proportion of the vertebrate transcriptome and probably regulate many biological processes. We used publically available ESTs (Expressed Sequence Tags) from human, mouse and zebrafish and a previously published analysis pipeline to annotate and analyze the vertebrate non-protein-coding transcriptome. Comparative analysis confirmed some previously described features of intergenic ncRNAs, such as a positionally biased distribution with respect to regulatory or development related protein-coding genes, and weak but clear sequence conservation across species. Significantly, comparative analysis of developmental and regulatory genes proximate to long ncRNAs indicated that the only conserved relationship of these genes to neighbor long ncRNAs was with respect to genes expressed in human brain, suggesting a conserved, ncRNA cis-regulatory network in vertebrate nervous system development. Most of the relationships between long ncRNAs and proximate coding genes were not conserved, providing evidence for the rapid evolution of species-specific gene associated long ncRNAs. We have reconstructed and annotated over 130,000 long ncRNAs in these three species, providing a significantly expanded number of candidates for functional testing by the research community.
Collapse
|
17
|
Formation of triple-helical structures by the 3'-end sequences of MALAT1 and MENβ noncoding RNAs. Proc Natl Acad Sci U S A 2012; 109:19202-7. [PMID: 23129630 DOI: 10.1073/pnas.1217338109] [Citation(s) in RCA: 235] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Stability of the long noncoding-polyadenylated nuclear (PAN) RNA from Kaposi's sarcoma-associated herpesvirus is conferred by an expression and nuclear retention element (ENE). The ENE protects PAN RNA from a rapid deadenylation-dependent decay pathway via formation of a triple helix between the U-rich internal loop of the ENE and the 3'-poly(A) tail. Because viruses borrow molecular mechanisms from their hosts, we searched highly abundant human long-noncoding RNAs and identified putative ENE-like structures in metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) and multiple endocrine neoplasia-β (MENβ) RNAs. Unlike the PAN ENE, the U-rich internal loops of both predicted cellular ENEs are interrupted by G and C nucleotides and reside upstream of genomically encoded A-rich tracts. We confirmed the ability of MALAT1 and MENβ sequences containing the predicted ENE and A-rich tract to increase the levels of an intronless β-globin reporter RNA. UV thermal denaturation profiles at different pH values support formation of a triple-helical structure composed of multiple U•A-U base triples and a single C•G-C base triple. Additional analyses of the MALAT1 ENE revealed that robust stabilization activity requires an intact triple helix, strong stems at the duplex-triplex junctions, a G-C base pair flanking the triplex to mediate potential A-minor interactions, and the 3'-terminal A of the A-rich tract to form a blunt-ended triplex lacking unpaired nucleotides at the duplex-triplex junction. These examples of triple-helical, ENE-like structures in cellular noncoding RNAs, are unique.
Collapse
|
18
|
Qu Z, Adelson DL. Evolutionary conservation and functional roles of ncRNA. Front Genet 2012; 3:205. [PMID: 23087702 PMCID: PMC3466565 DOI: 10.3389/fgene.2012.00205] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2012] [Accepted: 09/24/2012] [Indexed: 11/24/2022] Open
Abstract
Non-coding RNAs (ncRNAs) are a class of transcribed RNA molecules without protein-coding potential. They were regarded as transcriptional noise, or the byproduct of genetic information flow from DNA to protein for a long time. However, in recent years, a number of studies have shown that ncRNAs are pervasively transcribed, and most of them show evidence of evolutionary conservation, although less conserved than protein-coding genes. More importantly, many ncRNAs have been confirmed as playing crucial regulatory roles in diverse biological processes and tumorigenesis. Here we summarize the functional significance of this class of “dark matter” in terms its genomic organization, evolutionary conservation, and broad functional classes.
Collapse
Affiliation(s)
- Zhipeng Qu
- School of Molecular and Biomedical Science, The University of Adelaide Adelaide, SA, Australia
| | | |
Collapse
|
19
|
Wu DY. Big (sequencing) future of non-coding RNA research for the understanding of cocaine. Front Genet 2012; 3:158. [PMID: 22969790 PMCID: PMC3432494 DOI: 10.3389/fgene.2012.00158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Accepted: 08/05/2012] [Indexed: 01/29/2023] Open
Affiliation(s)
- Da-Yu Wu
- Division of Basic Neuroscience and Behavior Research, National Institute on Drug Abuse, National Institutes of Health Bethesda, MD, USA
| |
Collapse
|
20
|
Relle M, Becker M, Meyer RG, Stassen M, Schwarting A. Intronic promoters and their noncoding transcripts: A new source of cancer-associated genes. Mol Carcinog 2012; 53:117-24. [DOI: 10.1002/mc.21955] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 08/01/2012] [Indexed: 01/19/2023]
Affiliation(s)
- Manfred Relle
- I. Department of Medicine; University Medical Center of the Johannes-Gutenberg University Mainz; Mainz Germany
| | - Marc Becker
- I. Department of Medicine; University Medical Center of the Johannes-Gutenberg University Mainz; Mainz Germany
| | - Ralf G. Meyer
- Department of Hematology, Oncology, and Pneumology; University Medical Center of the Johannes-Gutenberg University Mainz; Mainz Germany
| | - Michael Stassen
- Institute for Immunology; University Medical Center of the Johannes-Gutenberg University Mainz; Mainz Germany
| | - Andreas Schwarting
- I. Department of Medicine; University Medical Center of the Johannes-Gutenberg University Mainz; Mainz Germany
| |
Collapse
|
21
|
Qu Z, Adelson DL. Bovine ncRNAs are abundant, primarily intergenic, conserved and associated with regulatory genes. PLoS One 2012; 7:e42638. [PMID: 22880061 PMCID: PMC3412814 DOI: 10.1371/journal.pone.0042638] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2012] [Accepted: 07/11/2012] [Indexed: 12/15/2022] Open
Abstract
It is apparent that non-coding transcripts are a common feature of higher organisms and encode uncharacterized layers of genetic regulation and information. We used public bovine EST data from many developmental stages and tissues, and developed a pipeline for the genome wide identification and annotation of non-coding RNAs (ncRNAs). We have predicted 23,060 bovine ncRNAs, 99% of which are un-annotated, based on known ncRNA databases. Intergenic transcripts accounted for the majority (57%) of the predicted ncRNAs and the occurrence of ncRNAs and genes were only moderately correlated (r = 0.55, p-value<2.2e-16). Many of these intergenic non-coding RNAs mapped close to the 3′ or 5′ end of thousands of genes and many of these were transcribed from the opposite strand with respect to the closest gene, particularly regulatory-related genes. Conservation analyses showed that these ncRNAs were evolutionarily conserved, and many intergenic ncRNAs proximate to genes contained sequence-specific motifs. Correlation analysis of expression between these intergenic ncRNAs and protein-coding genes using RNA-seq data from a variety of tissues showed significant correlations with many transcripts. These results support the hypothesis that ncRNAs are common, transcribed in a regulated fashion and have regulatory functions.
Collapse
Affiliation(s)
- Zhipeng Qu
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, South Australia, Australia
| | | |
Collapse
|
22
|
Seemann SE, Sunkin SM, Hawrylycz MJ, Ruzzo WL, Gorodkin J. Transcripts with in silico predicted RNA structure are enriched everywhere in the mouse brain. BMC Genomics 2012; 13:214. [PMID: 22651826 PMCID: PMC3464589 DOI: 10.1186/1471-2164-13-214] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Accepted: 05/31/2012] [Indexed: 01/24/2023] Open
Abstract
Background Post-transcriptional control of gene expression is mostly conducted by specific elements in untranslated regions (UTRs) of mRNAs, in collaboration with specific binding proteins and RNAs. In several well characterized cases, these RNA elements are known to form stable secondary structures. RNA secondary structures also may have major functional implications for long noncoding RNAs (lncRNAs). Recent transcriptional data has indicated the importance of lncRNAs in brain development and function. However, no methodical efforts to investigate this have been undertaken. Here, we aim to systematically analyze the potential for RNA structure in brain-expressed transcripts. Results By comprehensive spatial expression analysis of the adult mouse in situ hybridization data of the Allen Mouse Brain Atlas, we show that transcripts (coding as well as non-coding) associated with in silico predicted structured probes are highly and significantly enriched in almost all analyzed brain regions. Functional implications of these RNA structures and their role in the brain are discussed in detail along with specific examples. We observe that mRNAs with a structure prediction in their UTRs are enriched for binding, transport and localization gene ontology categories. In addition, after manual examination we observe agreement between RNA binding protein interaction sites near the 3’ UTR structures and correlated expression patterns. Conclusions Our results show a potential use for RNA structures in expressed coding as well as noncoding transcripts in the adult mouse brain, and describe the role of structured RNAs in the context of intracellular signaling pathways and regulatory networks. Based on this data we hypothesize that RNA structure is widely involved in transcriptional and translational regulatory mechanisms in the brain and ultimately plays a role in brain function.
Collapse
Affiliation(s)
- Stefan E Seemann
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Denmark
| | | | | | | | | |
Collapse
|
23
|
Conley AB, Jordan IK. Epigenetic regulation of human cis-natural antisense transcripts. Nucleic Acids Res 2012; 40:1438-45. [PMID: 22371288 PMCID: PMC3287164 DOI: 10.1093/nar/gkr1010] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Mammalian genomes encode numerous cis-natural antisense transcripts (cis-NATs). The extent to which these cis-NATs are actively regulated and ultimately functionally relevant, as opposed to transcriptional noise, remains a matter of debate. To address this issue, we analyzed the chromatin environment and RNA Pol II binding properties of human cis-NAT promoters genome-wide. Cap analysis of gene expression data were used to identify thousands of cis-NAT promoters, and profiles of nine histone modifications and RNA Pol II binding for these promoters in ENCODE cell types were analyzed using chromatin immunoprecipitation followed by sequencing (ChIP-seq) data. Active cis-NAT promoters are enriched with activating histone modifications and occupied by RNA Pol II, whereas weak cis-NAT promoters are depleted for both activating modifications and RNA Pol II. The enrichment levels of activating histone modifications and RNA Pol II binding show peaks centered around cis-NAT transcriptional start sites, and the levels of activating histone modifications at cis-NAT promoters are positively correlated with cis-NAT expression levels. Cis-NAT promoters also show highly tissue-specific patterns of expression. These results suggest that human cis-NATs are actively transcribed by the RNA Pol II and that their expression is epigenetically regulated, prerequisites for a functional potential for many of these non-coding RNAs.
Collapse
Affiliation(s)
- Andrew B Conley
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | | |
Collapse
|
24
|
Nitz I, Kruse ML, Klapper M, Döring F. Specific regulation of low-abundance transcript variants encoding human Acyl-CoA binding protein (ACBP) isoforms. J Cell Mol Med 2011; 15:909-27. [PMID: 20345851 PMCID: PMC3922676 DOI: 10.1111/j.1582-4934.2010.01055.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Despite intensive efforts on annotation of eukaryotic transcriptoms, little is known about the regulation of low-abundance transcripts. To address this question, we analysed the regulation of novel low-abundance transcript variants of human acyl-CoA binding protein (ACBP), an important multifunctional housekeeping protein, which we have identified by screening of human expressed sequence tags in combination with ab initio gene prediction. By using RT-, real-time RT- and rapid amplification of cDNA ends-PCR in five human tissues, we find these transcripts, which are generated by a consequent use of alternative promoters and alternate first or first two exons, to be authentic ones. They show a tissue-specific distribution and intrinsic responsiveness to glucose and insulin. Promoter analyses of the corresponding transcripts revealed a differential regulation mediated by sterol regulatory element-binding protein-2, hepatocyte nuclear factor-4α and nuclear factor κB (NF-κB), central transcription factors of fat and glucose metabolism and inflammation. Subcellular localization studies of deduced isoforms in liver HepG2 cells showed that they are distributed in different compartments. By demonstrating that ACBP is a target of NF-κB, our findings link fatty acid metabolism with inflammation. Furthermore, our findings show that low-abundance transcripts are regulated in a similar mode than their high-abundance counterparts.
Collapse
Affiliation(s)
- Inke Nitz
- Institute of Human Nutrition and Food Science, Department of Molecular Prevention, Christian-Albrechts University, Kiel, Germany
| | | | | | | |
Collapse
|
25
|
Likić VA, McConville MJ, Lithgow T, Bacic A. Systems biology: the next frontier for bioinformatics. Adv Bioinformatics 2011; 2010:268925. [PMID: 21331364 PMCID: PMC3038413 DOI: 10.1155/2010/268925] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Accepted: 11/01/2010] [Indexed: 01/01/2023] Open
Abstract
Biochemical systems biology augments more traditional disciplines, such as genomics, biochemistry and molecular biology, by championing (i) mathematical and computational modeling; (ii) the application of traditional engineering practices in the analysis of biochemical systems; and in the past decade increasingly (iii) the use of near-comprehensive data sets derived from 'omics platform technologies, in particular "downstream" technologies relative to genome sequencing, including transcriptomics, proteomics and metabolomics. The future progress in understanding biological principles will increasingly depend on the development of temporal and spatial analytical techniques that will provide high-resolution data for systems analyses. To date, particularly successful were strategies involving (a) quantitative measurements of cellular components at the mRNA, protein and metabolite levels, as well as in vivo metabolic reaction rates, (b) development of mathematical models that integrate biochemical knowledge with the information generated by high-throughput experiments, and (c) applications to microbial organisms. The inevitable role bioinformatics plays in modern systems biology puts mathematical and computational sciences as an equal partner to analytical and experimental biology. Furthermore, mathematical and computational models are expected to become increasingly prevalent representations of our knowledge about specific biochemical systems.
Collapse
Affiliation(s)
- Vladimir A. Likić
- Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Malcolm J. McConville
- Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, 3010, Australia
- Department of Biochemistry and Molecular Biology, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Trevor Lithgow
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC, 3800, Australia
| | - Antony Bacic
- Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, 3010, Australia
- Australian Centre for Plant Functional Genomics, School of Botany, The University of Melbourne, Parkville, VIC, 3010, Australia
| |
Collapse
|
26
|
Šponer J, Šponer JE, Petrov AI, Leontis NB. Quantum chemical studies of nucleic acids: can we construct a bridge to the RNA structural biology and bioinformatics communities? J Phys Chem B 2010; 114:15723-41. [PMID: 21049899 PMCID: PMC4868365 DOI: 10.1021/jp104361m] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
In this feature article, we provide a side-by-side introduction for two research fields: quantum chemical calculations of molecular interaction in nucleic acids and RNA structural bioinformatics. Our main aim is to demonstrate that these research areas, while largely separated in contemporary literature, have substantial potential to complement each other that could significantly contribute to our understanding of the exciting world of nucleic acids. We identify research questions amenable to the combined application of modern ab initio methods and bioinformatics analysis of experimental structures while also assessing the limitations of these approaches. The ultimate aim is to attain valuable physicochemical insights regarding the nature of the fundamental molecular interactions and how they shape RNA structures, dynamics, function, and evolution.
Collapse
Affiliation(s)
- Jiří Šponer
- Institute of Biophysics, Academy of Sciences of the Czech Republic, Královopolská 135, 61265 Brno, Czech Republic
| | - Judit E. Šponer
- Institute of Biophysics, Academy of Sciences of the Czech Republic, Královopolská 135, 61265 Brno, Czech Republic
| | - Anton I. Petrov
- Department of Biological Sciences, Bowling Green State University, Bowling Green, OH 43403, USA
| | - Neocles B. Leontis
- Department of Chemistry, Bowling Green State University, Bowling Green, OH 43403, USA
| |
Collapse
|
27
|
Mercer TR, Dinger ME, Bracken CP, Kolle G, Szubert JM, Korbie DJ, Askarian-Amiri ME, Gardiner BB, Goodall GJ, Grimmond SM, Mattick JS. Regulated post-transcriptional RNA cleavage diversifies the eukaryotic transcriptome. Genome Res 2010; 20:1639-50. [PMID: 21045082 DOI: 10.1101/gr.112128.110] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The complexity of the eukaryotic transcriptome is generated by the interplay of transcription initiation, termination, alternative splicing, and other forms of post-transcriptional modification. It was recently shown that RNA transcripts may also undergo cleavage and secondary 5' capping. Here, we show that post-transcriptional cleavage of RNA contributes to the diversification of the transcriptome by generating a range of small RNAs and long coding and noncoding RNAs. Using genome-wide histone modification and RNA polymerase II occupancy data, we confirm that the vast majority of intraexonic CAGE tags are derived from post-transcriptional processing. By comparing exonic CAGE tags to tissue-matched PARE data, we show that the cleavage and subsequent secondary capping is regulated in a developmental-stage- and tissue-specific manner. Furthermore, we find evidence of prevalent RNA cleavage in numerous transcriptomic data sets, including SAGE, cDNA, small RNA libraries, and deep-sequenced size-fractionated pools of RNA. These cleavage products include mRNA variants that retain the potential to be translated into shortened functional protein isoforms. We conclude that post-transcriptional RNA cleavage is a key mechanism that expands the functional repertoire and scope for regulatory control of the eukaryotic transcriptome.
Collapse
Affiliation(s)
- Tim R Mercer
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Prakash T, Sharma VK, Adati N, Ozawa R, Kumar N, Nishida Y, Fujikake T, Takeda T, Taylor TD. Expression of conjoined genes: another mechanism for gene regulation in eukaryotes. PLoS One 2010; 5:e13284. [PMID: 20967262 PMCID: PMC2953495 DOI: 10.1371/journal.pone.0013284] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2010] [Accepted: 09/14/2010] [Indexed: 11/19/2022] Open
Abstract
From the ENCODE project, it is realized that almost every base of the entire human genome is transcribed. One class of transcripts resulting from this arises from the conjoined gene, which is formed by combining the exons of two or more distinct (parent) genes lying on the same strand of a chromosome. Only a very limited number of such genes are known, and the definition and terminologies used for them are highly variable in the public databases. In this work, we have computationally identified and manually curated 751 conjoined genes (CGs) in the human genome that are supported by at least one mRNA or EST sequence available in the NCBI database. 353 representative CGs, of which 291 (82%) could be confirmed, were subjected to experimental validation using RT-PCR and sequencing methods. We speculate that these genes are arising out of novel functional requirements and are not merely artifacts of transcription, since more than 70% of them are conserved in other vertebrate genomes. The unique splicing patterns exhibited by CGs reveal their possible roles in protein evolution or gene regulation. Novel CGs, for which no transcript is available, could be identified in 80% of randomly selected potential CG forming regions, indicating that their formation is a routine process. Formation of CGs is not only limited to human, as we have also identified 270 CGs in mouse and 227 in drosophila using our approach. Additionally, we propose a novel mechanism for the formation of CGs. Finally, we developed a database, ConjoinG, which contains detailed information about all the CGs (800 in total) identified in the human genome. In summary, our findings reveal new insights about the functionality of CGs in terms of another possible mechanism for gene regulation and genomic evolution and the mechanism leading to their formation.
Collapse
Affiliation(s)
- Tulika Prakash
- MetaSystems Research Team, Computational Systems Biology Research Group, Advanced Computational Sciences Department, RIKEN Advanced Science Institute (ASI), Yokohama, Japan
| | - Vineet K. Sharma
- MetaSystems Research Team, Computational Systems Biology Research Group, Advanced Computational Sciences Department, RIKEN Advanced Science Institute (ASI), Yokohama, Japan
| | - Naoki Adati
- MetaSystems Research Team, Computational Systems Biology Research Group, Advanced Computational Sciences Department, RIKEN Advanced Science Institute (ASI), Yokohama, Japan
| | - Ritsuko Ozawa
- MetaSystems Research Team, Computational Systems Biology Research Group, Advanced Computational Sciences Department, RIKEN Advanced Science Institute (ASI), Yokohama, Japan
| | - Naveen Kumar
- MetaSystems Research Team, Computational Systems Biology Research Group, Advanced Computational Sciences Department, RIKEN Advanced Science Institute (ASI), Yokohama, Japan
| | - Yuichiro Nishida
- MetaSystems Research Team, Computational Systems Biology Research Group, Advanced Computational Sciences Department, RIKEN Advanced Science Institute (ASI), Yokohama, Japan
| | - Takayoshi Fujikake
- MetaSystems Research Team, Computational Systems Biology Research Group, Advanced Computational Sciences Department, RIKEN Advanced Science Institute (ASI), Yokohama, Japan
| | - Tadayuki Takeda
- MetaSystems Research Team, Computational Systems Biology Research Group, Advanced Computational Sciences Department, RIKEN Advanced Science Institute (ASI), Yokohama, Japan
| | - Todd D. Taylor
- MetaSystems Research Team, Computational Systems Biology Research Group, Advanced Computational Sciences Department, RIKEN Advanced Science Institute (ASI), Yokohama, Japan
- * E-mail:
| |
Collapse
|
29
|
Maunakea AK, Nagarajan RP, Bilenky M, Ballinger TJ, D'Souza C, Fouse SD, Johnson BE, Hong C, Nielsen C, Zhao Y, Turecki G, Delaney A, Varhol R, Thiessen N, Shchors K, Heine VM, Rowitch DH, Xing X, Fiore C, Schillebeeckx M, Jones SJM, Haussler D, Marra MA, Hirst M, Wang T, Costello JF. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 2010; 466:253-7. [PMID: 20613842 PMCID: PMC3998662 DOI: 10.1038/nature09165] [Citation(s) in RCA: 1267] [Impact Index Per Article: 84.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 05/06/2010] [Indexed: 12/18/2022]
Abstract
While the methylation of DNA in 5′ promoters suppresses gene expression, the role of DNA methylation in gene bodies is unclear1–5. In mammals, tissue- and cell type-specific methylation is present in a small percentage of 5′ CpG island (CGI) promoters, while a far greater proportion occurs across gene bodies, coinciding with highly conserved sequences5–10. Tissue-specific intragenic methylation might reduce,3 or, paradoxically, enhance transcription elongation efficiency1,2,4,5. Capped analysis of gene expression (CAGE) experiments also indicate that transcription commonly initiates within and between genes11–15. To investigate the role of intragenic methylation, we generated a map of DNA methylation from human brain encompassing 24.7 million of the 28 million CpG sites. From the dense, high-resolution coverage of CpG islands, the majority of methylated CpG islands were revealed to be in intragenic and intergenic regions, while less than 3% of CpG islands in 5′ promoters were methylated. The CpG islands in all three locations overlapped with RNA markers of transcription initiation, and unmethylated CpG islands also overlapped significantly with trimethylation of H3K4, a histone modification enriched at promoters16. The general and CpG-island-specific patterns of methylation are conserved in mouse tissues. An in-depth investigation of the human SHANK3 locus17,18 and its mouse homologue demonstrated that this tissue-specific DNA methylation regulates intragenic promoter activity in vitro and in vivo. These methylation-regulated, alternative transcripts are expressed in a tissue and cell type-specific manner, and are expressed differentially within a single cell type from distinct brain regions. These results support a major role for intragenic methylation in regulating cell context-specific alternative promoters in gene bodies.
Collapse
Affiliation(s)
- Alika K Maunakea
- Brain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 94158, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Bernard V, Brunaud V, Lecharny A. TC-motifs at the TATA-box expected position in plant genes: a novel class of motifs involved in the transcription regulation. BMC Genomics 2010; 11:166. [PMID: 20222994 PMCID: PMC2842252 DOI: 10.1186/1471-2164-11-166] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2009] [Accepted: 03/12/2010] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND The TATA-box and TATA-variants are regulatory elements involved in the formation of a transcription initiation complex. Both have been conserved throughout evolution in a restricted region close to the Transcription Start Site (TSS). However, less than half of the genes in model organisms studied so far have been found to contain either one of these elements. Indeed different core-promoter elements are involved in the recruitment of the TATA-box-binding protein. Here we assessed the possibility of identifying novel functional motifs in plant genes, sharing the TATA-box topological constraints. RESULTS We developed an ab-initio approach considering the preferential location of motifs relative to the TSS. We identified motifs observed at the TATA-box expected location and conserved in both Arabidopsis thaliana and Oryza sativa promoters. We identified TC-elements within non-TA-rich promoters 30 bases upstream of the TSS. As with the TATA-box and TATA-variant sequences, it was possible to construct a unique distance graph with the TC-element sequences. The structural and functional features of TC-element-containing genes were distinct from those of TATA-box- or TATA-variant-containing genes. Arabidopsis thaliana transcriptome analysis revealed that TATA-box-containing genes were generally those showing relatively high levels of expression and that TC-element-containing genes were generally those expressed in specific conditions. CONCLUSIONS Our observations suggest that the TC-elements might constitute a class of novel regulatory elements participating towards the complex modulation of gene expression in plants.
Collapse
Affiliation(s)
- Virginie Bernard
- Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165-CNRS 8114-UEVE, 2 Rue Gaston Crémieux, 91057 Evry Cedex, France
| | | | | |
Collapse
|
31
|
Abstract
Initial gene discovery efforts through analysis of genome sequences and identification and characterization of expressed RNAs have revealed that only a relatively small portion of the genome is transcribed into protein coding mRNAs in vertebrates. However, in contrast with this paucity of protein coding ‘genes’, there is an enormous complexity in transcription and the protein coding mRNAs contribute to a very small fraction of transcripts in comparison with the different varieties of non-coding RNAs (ncRNAs). This transcriptome complexity may be hypothesized to have a regulatory role that is required for the development and function of organisms as complex as vertebrates. At the same time, it raises the fundamental question of the unequivocal definition of a gene. It is intriguing to postulate that many ncRNAs might finely modulate gene activity by acting as regulatory elements. The emerging hypotheses suggest that the gene regulatory machinery may be deeply interconnected with the world of short RNAs. These RNAs may generally act for fine-tuning of the protein-coding transcriptome.
Collapse
Affiliation(s)
- Piero Carninci
- Omics Science Center, RIKEN Yokohama Institute, Kanagawa, Japan.
| |
Collapse
|
32
|
Abstract
In recent years geneticists have witnessed many significant observations which have seriously shaken the traditional concept of the gene. These specifically include the facts that (1) the boundaries of transcriptional units are far from clear; in fact, whole chromosomes if not the whole genome seem to be continuums of genetic transcription, (2) many examples of gene fusion are known, (3) likewise many examples of so-called encrypted genes are known in the organelle genomes of microbial eukaryotes and in prokaryotes, and (4) in addition to the structure of the gene, its functional status can also be inheritable, and, further, (5) epigenetic extra-genomic modes of inheritance, called genetic restoration, seem to be a rather common phenomenon, meaning that organisms can sometimes rewrite their DNA on the basis of RNA messages inherited from generations past. I will briefly review these observations and discuss the difficulties of defining the gene, and then formulate a new view, which is called the relational or systemic concept of the gene. It has to be noted that genes assume their information content characteristics in the Shannonian sense as nucleotide sequences of DNA (or RNA). However, on the basis of this we cannot say anything about their information content in the semantic sense. The semantic information content of genes is context-dependent. Genes namely assume their biochemical characteristics usually only within living cells, their developmental characteristics only within living organisms, and their evolutionary characteristics only within populations of living organisms.
Collapse
Affiliation(s)
- Petter Portin
- Laboratory of Genetics, Department of Biology, University of Turku, Turku, Finland.
| |
Collapse
|
33
|
Uwanogho DA, Yasin SA, Starling B, Price J. The intergenic region between the Mouse Recql4 and Lrrc14 genes functions as an evolutionary conserved bidirectional promoter. Gene 2009; 449:103-17. [PMID: 19720120 DOI: 10.1016/j.gene.2009.08.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2009] [Revised: 08/13/2009] [Accepted: 08/17/2009] [Indexed: 11/25/2022]
Abstract
Mammalian genomes are highly complex, with neighbouring genes arranged in divergent, convergent, tandem, antisense, and interleaving fashions. Despite the vast genomic space, a substantial portion of human genes (approximately 10%) are arranged in a divergent, head-to-head fashion and controlled by bidirectional promoters. Here we define a small core bidirectional promoter that drives expression of the mouse genes Recql4, on one strand, and Lrrc14; a novel member of the LRR gene family, on the opposite strand. Regulation of Lrrc14 expression is highly complex, involving multiple promoters' and alternative splicing. Expression of this gene is predominately restricted to neural tissue during embryogenesis and is expressed in a wide range of tissues in the adult.
Collapse
Affiliation(s)
- D A Uwanogho
- Department of Neuroscience, Centre for the Cellular Basis of Behaviour & MRC Centre for Neurodegeneration Research, Institute of Psychiatry, King's College London, Denmark Hill, London SE5 9NU, UK.
| | | | | | | |
Collapse
|
34
|
Balwierz PJ, Carninci P, Daub CO, Kawai J, Hayashizaki Y, Van Belle W, Beisel C, van Nimwegen E. Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data. Genome Biol 2009; 10:R79. [PMID: 19624849 PMCID: PMC2728533 DOI: 10.1186/gb-2009-10-7-r79] [Citation(s) in RCA: 105] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2008] [Revised: 03/02/2009] [Accepted: 07/22/2009] [Indexed: 11/10/2022] Open
Abstract
A set of methods is presented for normalization, quantification of noise and co-expression analysis for gene expression studies using deep sequencing. With the advent of ultra high-throughput sequencing technologies, increasingly researchers are turning to deep sequencing for gene expression studies. Here we present a set of rigorous methods for normalization, quantification of noise, and co-expression analysis of deep sequencing data. Using these methods on 122 cap analysis of gene expression (CAGE) samples of transcription start sites, we construct genome-wide 'promoteromes' in human and mouse consisting of a three-tiered hierarchy of transcription start sites, transcription start clusters, and transcription start regions.
Collapse
Affiliation(s)
- Piotr J Balwierz
- Biozentrum, University of Basel, and Swiss Institute of Bioinformatics, Klingelbergstrasse 50/70, 4056-CH, Basel, Switzerland
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Akopov SB, Chernov IP, Wahlström T, Kostina MB, Klein G, Henriksson M, Nikolaev LG. Identification of recognition sites for myc/max/mxd network proteins by a whole human chromosome 19 selection strategy. BIOCHEMISTRY (MOSCOW) 2009; 73:1260-8. [PMID: 19120031 DOI: 10.1134/s0006297908110138] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
In this study, we have identified 20 human sequences containing Myc network binding sites in a library from the whole human chromosome 19. We demonstrated binding of the Max protein to these sequences both in vitro and in vivo. The majority of the identified sequences contained one or several CACGTG or CATGTG E-boxes. Several of these sites were located within introns or in their vicinity and the corresponding genes were found to be up- or down-regulated in differentiating HL-60 cells. Our data show the proof of principle for using this strategy in identification of Max target genes, and this method can also be applied for other transcription factors.
Collapse
Affiliation(s)
- S B Akopov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, 117997, Russia
| | | | | | | | | | | | | |
Collapse
|
36
|
Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques 2008; 45:81-94. [PMID: 18611170 DOI: 10.2144/000112900] [Citation(s) in RCA: 260] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Sequence-based methods for transcriptome characterization have typically relied on generation of either serial analysis of gene expression tags or expressed sequence tags. Although such approaches have the potential to enumerate transcripts by counting sequence tags derived from them, they typically do not robustly survey the majority of transcripts along their entire length. Here we show that massively parallel sequencing of randomly primed cDNAs, using a next-generation sequencing-by-synthesis technology, offers the potential to generate relative measures of mRNA and individual exon abundance while simultaneously profiling the prevalence of both annotated and novel exons and exon-splicing events. This technique identifies known single nucleotide polymorphisms (SNPs) as well as novel single-base variants. Analysis of these variants, and previously unannotated splicing events in the HeLa S3 cell line, reveals an overrepresentation of gene categories including those previously implicated in cancer.
Collapse
|
37
|
Plessy C, Fagiolini M, Wagatsuma A, Harasawa N, Kuji T, Asaka-Oba A, Kanzaki Y, Fujishima S, Waki K, Nakahara H, Hensch TK, Carninci P. A resource for transcriptomic analysis in the mouse brain. PLoS One 2008; 3:e3012. [PMID: 18714383 PMCID: PMC2507754 DOI: 10.1371/journal.pone.0003012] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2008] [Accepted: 07/23/2008] [Indexed: 11/18/2022] Open
Abstract
Background The transcriptome of the cerebral cortex is remarkably homogeneous, with variations being stronger between individuals than between areas. It is thought that due to the presence of many distinct cell types, differences within one cell population will be averaged with the noise from others. Studies of sorted cells expressing the same transgene have shown that cell populations can be distinguished according to their transcriptional profile. Methodology We have prepared a low-redundancy set of 16,209 full-length cDNA clones which represents the transcriptome of the mouse visual cortex in its coding and non-coding aspects. Using an independent tag-based approach, CAGE, we confirmed the cortical expression of 72% of the clones. Clones were amplified by PCR and spotted on glass slides, and we interrogated the microarrays with RNA from flow-sorted fluorescent cells from the cerebral cortex of parvalbumin-egfp transgenic mice. Conclusions We provide an annotated cDNA clone collection which is particularly suitable for transcriptomic analysis in the mouse brain. Spotting it on microarrays, we compared the transcriptome of EGFP positive and negative cells in a parvalbumin-egfp transgenic background and showed that more than 30% of clones are differentially expressed. Our clone collection will be a useful resource for the study of the transcriptome of single cell types in the cerebral cortex.
Collapse
Affiliation(s)
- Charles Plessy
- Functional Genomics Technology Team, Omics Science Center, RIKEN Yokohama Institute, Yokohama, Kanagawa, Japan
| | - Michela Fagiolini
- Laboratory for Neuronal Circuit Development, RIKEN Brain Science Institute, Wakô, Saitama, Japan
| | - Akiko Wagatsuma
- Laboratory for Neuronal Circuit Development, RIKEN Brain Science Institute, Wakô, Saitama, Japan
| | - Norihiro Harasawa
- Laboratory for Integrated Theoretical Neuroscience, RIKEN Brain Science Institute, Wakô, Saitama, Saitama, Japan
| | - Takenobu Kuji
- Laboratory for Neuronal Circuit Development, RIKEN Brain Science Institute, Wakô, Saitama, Japan
| | - Atsuko Asaka-Oba
- Laboratory for Neuronal Circuit Development, RIKEN Brain Science Institute, Wakô, Saitama, Japan
| | - Yukari Kanzaki
- Laboratory for Neuronal Circuit Development, RIKEN Brain Science Institute, Wakô, Saitama, Japan
| | - Sayaka Fujishima
- Laboratory for Neuronal Circuit Development, RIKEN Brain Science Institute, Wakô, Saitama, Japan
| | - Kazunori Waki
- Genome Science Laboratory, Discovery and Research Institute, RIKEN Wakô Institute, Wakô, Saitama, Japan
| | - Hiroyuki Nakahara
- Laboratory for Integrated Theoretical Neuroscience, RIKEN Brain Science Institute, Wakô, Saitama, Saitama, Japan
| | - Takao K. Hensch
- Laboratory for Neuronal Circuit Development, RIKEN Brain Science Institute, Wakô, Saitama, Japan
- * E-mail: (TKH); (PC)
| | - Piero Carninci
- Functional Genomics Technology Team, Omics Science Center, RIKEN Yokohama Institute, Yokohama, Kanagawa, Japan
- * E-mail: (TKH); (PC)
| |
Collapse
|
38
|
Nikolaev LG, Akopov SB, Chernov IP, Sverdlov ED. Maps of cis-Regulatory Nodes in Megabase Long Genome Segments are an Inevitable Intermediate Step Toward Whole Genome Functional Mapping. Curr Genomics 2008; 8:137-49. [PMID: 18660850 DOI: 10.2174/138920207780368178] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2007] [Revised: 02/22/2007] [Accepted: 02/27/2007] [Indexed: 11/22/2022] Open
Abstract
The availability of complete human and other metazoan genome sequences has greatly facilitated positioning and analysis of various genomic functional elements, with initial emphasis on coding sequences. However, complete functional maps of sequenced eukaryotic genomes should include also positions of all non-coding regulatory elements. Unfortunately, experimental data on genomic positions of a multitude of regulatory sequences, such as enhancers, silencers, insulators, transcription terminators, and replication origins are very limited, especially at the whole genome level. Since most genomic regulatory elements (e.g. enhancers) are generally gene-, tissue-, or cell-specific, the prediction of these elements by computational methods is difficult and often ambiguous. Therefore, the development of high-throughput experimental approaches for identifying and mapping genomic functional elements is highly desirable. At the same time, the creation of whole-genome map of hundreds of thousands of regulatory elements in several hundreds of tissue/cell types is presently far beyond our capabilities. A possible alternative for the whole genome approach is to concentrate efforts on individual genomic segments and then to integrate the data obtained into a whole genome functional map. Moreover, the maps of polygenic fragments with functional cis-regulatory elements would provide valuable data on complex regulatory systems, including their variability and evolution. Here, we reviewed experimental approaches to the realization of these ideas, including our own developments of experimental techniques for selection of cis-acting functionally active DNA fragments from large (megabase-sized) segments of mammalian genomes.
Collapse
Affiliation(s)
- Lev G Nikolaev
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10 Miklukho-Maklaya,117997, Moscow, Russia
| | | | | | | |
Collapse
|
39
|
Carninci P, Yasuda J, Hayashizaki Y. Multifaceted mammalian transcriptome. Curr Opin Cell Biol 2008; 20:274-80. [PMID: 18468878 DOI: 10.1016/j.ceb.2008.03.008] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2008] [Accepted: 03/20/2008] [Indexed: 02/03/2023]
Abstract
Despite surprisingly a small number of protein-coding gene in mammalian genomes, a large variety of different RNAs is being produced. These RNAs are amazingly different in their number, size, cell localization, and mechanism of actions. Although new classes of short RNAs (sRNAs) are being continuously discovered, it is not yet obvious how many of the sRNAs are originated. Altogether, the research in the recent few years has identified an unexpectedly rich variety of mechanisms by which noncoding RNAs act, suggesting that we have identified probably only few of the many potential functional mechanism and more investigation will be needed to comprehensively understand the complex nature and biology of mammalian RNAome. Here, we focus on various aspects of the diversity of the biological role of these nonprotein-coding RNAs (ncRNAs), with emphasis on functional mechanisms recently elucidated.
Collapse
Affiliation(s)
- Piero Carninci
- Genome Science Laboratory, Discovery and Research Institute, RIKEN Wako Institute, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
| | | | | |
Collapse
|
40
|
Elango N, Yi SV. DNA methylation and structural and functional bimodality of vertebrate promoters. Mol Biol Evol 2008; 25:1602-8. [PMID: 18469331 DOI: 10.1093/molbev/msn110] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Human promoters divide into 2 classes, the low CpG (LCG) and the high CpG (HCG), based on their CpG dinucleotide content. The LCG class of promoters is hypermethylated and is associated with tissue-specific genes, whereas the HCG class is hypomethylated and associated with broadly expressed genes. By analyzing several chordate genomes separated for hundreds of millions of years, here we show that the divide between low CpG and high CpG promoters is conserved in several distantly related vertebrate taxa (including human, chicken, frog, lizard, and fish) but not in close invertebrate outgroups (sea squirts). Furthermore, LCG and HCG promoters are distinctively associated with tissue-specific and broadly expressed genes in these distantly related vertebrate taxa. Our results indicate that the function of DNA methylation on gene expression is conserved across these vertebrate taxa and suggest that the 2 classes of promoters have evolved early in vertebrate evolution, as a consequence of the advent of global DNA methylation.
Collapse
Affiliation(s)
- Navin Elango
- School of Biology, Georgia Institute of Technology, USA
| | | |
Collapse
|
41
|
Zhu J, He F, Song S, Wang J, Yu J. How many human genes can be defined as housekeeping with current expression data? BMC Genomics 2008; 9:172. [PMID: 18416810 PMCID: PMC2396180 DOI: 10.1186/1471-2164-9-172] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2007] [Accepted: 04/16/2008] [Indexed: 12/16/2022] Open
Abstract
Background Housekeeping (HK) genes are ubiquitously expressed in all tissue/cell types and constitute a basal transcriptome for the maintenance of basic cellular functions. Partitioning transcriptomes into HK and tissue-specific (TS) genes relatively is fundamental for studying gene expression and cellular differentiation. Although many studies have aimed at large-scale and thorough categorization of human HK genes, a meaningful consensus has yet to be reached. Results We collected two latest gene expression datasets (both EST and microarray data) from public databases and analyzed the gene expression profiles in 18 human tissues that have been well-documented by both two data types. Benchmarked by a manually-curated HK gene collection (HK408), we demonstrated that present data from EST sampling was far from saturated, and the inadequacy has limited the gene detectability and our understanding of TS expressions. Due to a likely over-stringent threshold, microarray data showed higher false negative rate compared with EST data, leading to a significant underestimation of HK genes. Based on EST data, we found that 40.0% of the currently annotated human genes were universally expressed in at least 16 of 18 tissues, as compared to only 5.1% specifically expressed in a single tissue. Our current EST-based estimate on human HK genes ranged from 3,140 to 6,909 in number, a ten-fold increase in comparison with previous microarray-based estimates. Conclusion We concluded that a significant fraction of human genes, at least in the currently annotated data depositories, was broadly expressed. Our understanding of tissue-specific expression was still preliminary and required much more large-scale and high-quality transcriptomic data in future studies. The new HK gene list categorized in this study will be useful for genome-wide analyses on structural and functional features of HK genes.
Collapse
Affiliation(s)
- Jiang Zhu
- Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
| | | | | | | | | |
Collapse
|
42
|
Maenz B, Hekerman P, Vela EM, Galceran J, Becker W. Characterization of the human DYRK1A promoter and its regulation by the transcription factor E2F1. BMC Mol Biol 2008; 9:30. [PMID: 18366763 PMCID: PMC2292204 DOI: 10.1186/1471-2199-9-30] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2007] [Accepted: 03/26/2008] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Overexpression of the human DYRK1A gene due to the presence of a third gene copy in trisomy 21 is thought to play a role in the pathogenesis of Down syndrome. The observation of gene dosage effects in transgenic mouse models implies that subtle changes in expression levels can affect the correct function of the DYRK1A gene product. We have therefore characterized the promoter of the human DYRK1A gene in order to study its transcriptional regulation. RESULTS Transcription start sites of the human DYRK1A gene are distributed over 800 bp within a region previously identified as an unmethylated CpG island. We have identified a new alternative noncoding 5'-exon of the DYRK1A gene which is located 772 bp upstream of the previously described transcription start site. Transcription of the two splicing variants is controlled by non-overlapping promoter regions that can independently drive reporter gene expression. We found no evidence of cell- or tissue-specific promoter usage, but the two promoter regions differed in their activity and their regulation. The sequence upstream of exon 1A (promoter region A) induced about 10-fold higher reporter gene activity than the sequence upstream of exon 1B (promoter region B). Overexpression of the transcription factor E2F1 increased DYRK1A mRNA levels in Saos2 and Phoenix cells and enhanced the activity of promoter region B three- to fourfold. CONCLUSION The identification of two alternatively spliced transcripts whose transcription is initiated from differentially regulated promoters regions indicates that the expression of the DYRK1A gene is subject to complex control mechanisms. The regulatory effect of E2F1 suggests that DYRK1A may play a role in cell cycle regulation or apoptosis.
Collapse
Affiliation(s)
- Barbara Maenz
- Institute of Pharmacology and Toxicology, Medical Faculty of the RWTH Aachen University, Wendlingweg 2, 52074 Aachen, Germany
| | - Paul Hekerman
- Institute of Pharmacology and Toxicology, Medical Faculty of the RWTH Aachen University, Wendlingweg 2, 52074 Aachen, Germany
| | - Eva M Vela
- Instituto de Neurociencias, CSIC – Universidad Miguel Hernandez, Campus de San Juan, 03550 San Juan (Alicante), Spain
| | - Juan Galceran
- Instituto de Neurociencias, CSIC – Universidad Miguel Hernandez, Campus de San Juan, 03550 San Juan (Alicante), Spain
| | - Walter Becker
- Institute of Pharmacology and Toxicology, Medical Faculty of the RWTH Aachen University, Wendlingweg 2, 52074 Aachen, Germany
| |
Collapse
|
43
|
Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, MacDonald JR, Pang AWC, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, Venter JC. The diploid genome sequence of an individual human. PLoS Biol 2008; 5:e254. [PMID: 17803354 PMCID: PMC1964779 DOI: 10.1371/journal.pbio.0050254] [Citation(s) in RCA: 1129] [Impact Index Per Article: 66.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2007] [Accepted: 07/30/2007] [Indexed: 01/20/2023] Open
Abstract
Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2-206 bp), 292,102 heterozygous insertion/deletion events (indels)(1-571 bp), 559,473 homozygous indels (1-82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.
Collapse
Affiliation(s)
- Samuel Levy
- J. Craig Venter Institute, Rockville, Maryland, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Abstract
In order to describe a cell at molecular level, a notion of a “gene” is neither necessary nor helpful. It is sufficient to consider the molecules (i.e., chromosomes, transcripts, proteins) and their interactions to describe cellular processes. The downside of the resulting high resolution is that it becomes very tedious to address features on the organismal and phenotypic levels with a language based on molecular terms. Looking for the missing link between biological disciplines dealing with different levels of biological organization, we suggest to return to the original intent behind the term “gene”. To this end, we propose to investigate whether a useful notion of “gene” can be constructed based on an underlying notion of function, and whether this can serve as the necessary link and embed the various distinct gene concepts of biological (sub)disciplines in a coherent theoretical framework. In reply to the Genon Theory recently put forward by Klaus Scherrer and Jürgen Jost in this journal, we shall discuss a general approach to assess a gene definition that should then be tested for its expressiveness and potential cross-disciplinary relevance.
Collapse
Affiliation(s)
- Sonja J Prohaska
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA.
| | | |
Collapse
|
45
|
Abstract
Recent progress in the analyses of the mouse transcriptome leads to unexpected discoveries. The mouse genomic sequences read by RNA polymerase II may be six times more than previously expected for human chromosomes. The transcript-abundant regions (named "transcription forests") occupy more than half of the genomic sequence and are divided by transcript-scarce regions (transcription deserts). Many of the coding mRNAs may have partially overlapping antisense RNAs. There are transcripts bridging several adjacent genes that were previously regarded as distinct ones. The transcription start sites appearing as cap analysis of gene expression (CAGE) tags are mapped on the mouse genomic sequences. Distributions of CAGE tags show that the shapes of mammalian gene promoters can be classified into four major categories. These shapes were conserved between mouse and human. Most of the gene has exonic transcription start sites, especially in the 3' untranslated region (3' UTR) sequences. The term "RNA continent" has been invented to express this unexpectedly complex and prodigious mouse transcriptome. More than a half of the RNA polymerase II transcripts are regarded as noncoding RNAs (ncRNAs). The great variety of ncRNAs in mammalian transcriptome implies that there are many functional ncRNAs in the cells. Especially, the evolutionarily conserved microRNAs play critical roles in mammalian development and other biological functions. Moreover, many other ncRNAs have also been shown to have biological significant functions, mainly in the regulation of gene expression. The functional survey of the RNA continent has just started. We will describe the state of the art of the RNA continent and its impact on the modern molecular biology, especially on the cancer research.
Collapse
Affiliation(s)
- Jun Yasuda
- Functional RNA Research Program, Frontier Research System, RIKEN Yokohama Institute, 1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | | |
Collapse
|
46
|
Abeel T, Saeys Y, Bonnet E, Rouzé P, Van de Peer Y. Generic eukaryotic core promoter prediction using structural features of DNA. Genes Dev 2008; 18:310-23. [PMID: 18096745 PMCID: PMC2203629 DOI: 10.1101/gr.6991408] [Citation(s) in RCA: 133] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2007] [Accepted: 11/14/2007] [Indexed: 11/24/2022]
Abstract
Despite many recent efforts, in silico identification of promoter regions is still in its infancy. However, the accurate identification and delineation of promoter regions is important for several reasons, such as improving genome annotation and devising experiments to study and understand transcriptional regulation. Current methods to identify the core region of promoters require large amounts of high-quality training data and often behave like black box models that output predictions that are difficult to interpret. Here, we present a novel approach for predicting promoters in whole-genome sequences by using large-scale structural properties of DNA. Our technique requires no training, is applicable to many eukaryotic genomes, and performs extremely well in comparison with the best available promoter prediction programs. Moreover, it is fast, simple in design, and has no size constraints, and the results are easily interpretable. We compared our approach with 14 current state-of-the-art implementations using human gene and transcription start site data and analyzed the ENCODE region in more detail. We also validated our method on 12 additional eukaryotic genomes, including vertebrates, invertebrates, plants, fungi, and protists.
Collapse
Affiliation(s)
- Thomas Abeel
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| | - Yvan Saeys
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| | - Eric Bonnet
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| | - Pierre Rouzé
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
- Laboratoire Associé de l’INRA (France), Ghent University, 9052 Gent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| |
Collapse
|
47
|
Harbers M. The current status of cDNA cloning. Genomics 2008; 91:232-42. [PMID: 18222633 DOI: 10.1016/j.ygeno.2007.11.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2007] [Revised: 11/10/2007] [Accepted: 11/17/2007] [Indexed: 11/19/2022]
Abstract
The cloning of cDNAs, copies of cellular RNA, is one of the classical technologies in molecular biology. Over the past 30 years cDNA cloning technologies have been improved to enable the cloning of large cDNA collections, which are fundamental to today's understanding of the utilization of genetic information. With the discovery of noncoding RNAs, additional new approaches to the cloning of short RNAs have been developed. However, with the realization that much larger portions of genomes are transcribed than anticipated from genome annotations, cDNA cloning faces new challenges to uncover rare transcripts and to make the corresponding cDNAs available for functional studies. This review provides an overview on the current status of cDNA cloning and possibilities for the discovery and characterization of new RNA families.
Collapse
Affiliation(s)
- Matthias Harbers
- DNAFORM, Inc., Leading Venture Plaza 2, 75-1 Ono-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0046, Japan.
| |
Collapse
|
48
|
Tan K, Tegner J, Ravasi T. Integrated approaches to uncovering transcription regulatory networks in mammalian cells. Genomics 2008; 91:219-31. [PMID: 18191937 DOI: 10.1016/j.ygeno.2007.11.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2007] [Revised: 11/14/2007] [Accepted: 11/16/2007] [Indexed: 11/16/2022]
Abstract
Integrative systems biology has emerged as an exciting research approach in molecular biology and functional genomics that involves the integration of genomics, proteomics, and metabolomics datasets. These endeavors establish a systematic paradigm by which to interrogate, model, and iteratively refine our knowledge of the regulatory events within a cell. Here we review the latest technologies available to collect high-throughput measurements of a cellular state as well as the most successful methods for the integration and interrogation of these measurements. In particular we will focus on methods available to infer transcription regulatory networks in mammals.
Collapse
Affiliation(s)
- Kai Tan
- Department of Bioengineering, Jacobs School of Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| | | | | |
Collapse
|
49
|
Lindow M, Jacobsen A, Nygaard S, Mang Y, Krogh A. Intragenomic matching reveals a huge potential for miRNA-mediated regulation in plants. PLoS Comput Biol 2008; 3:e238. [PMID: 18052543 PMCID: PMC2098865 DOI: 10.1371/journal.pcbi.0030238] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2007] [Accepted: 10/17/2007] [Indexed: 12/28/2022] Open
Abstract
microRNAs (miRNAs) are important post-transcriptional regulators, but the extent of this regulation is uncertain, both with regard to the number of miRNA genes and their targets. Using an algorithm based on intragenomic matching of potential miRNAs and their targets coupled with support vector machine classification of miRNA precursors, we explore the potential for regulation by miRNAs in three plant genomes: Arabidopsis thaliana, Populus trichocarpa, and Oryza sativa. We find that the intragenomic matching in conjunction with a supervised learning approach contains enough information to allow reliable computational prediction of miRNA candidates without requiring conservation across species. Using this method, we identify ∼1,200, ∼2,500, and ∼2,100 miRNA candidate genes capable of extensive base-pairing to potential target mRNAs in A. thaliana, P. trichocarpa, and O. sativa, respectively. This is more than five times the number of currently annotated miRNAs in the plants. Many of these candidates are derived from repeat regions, yet they seem to contain the features necessary for correct processing by the miRNA machinery. Conservation analysis indicates that only a few of the candidates are conserved between the species. We conclude that there is a large potential for miRNA-mediated regulatory interactions encoded in the genomes of the investigated plants. We hypothesize that some of these interactions may be realized under special environmental conditions, while others can readily be recruited when organisms diverge and adapt to new niches. microRNAs (miRNAs) are small RNA molecules that regulate gene expression by complementary basepairing to mRNAs. In plants, this base-pairing is almost perfect along the whole length of miRNAs. This long stretch of complementarity makes it relatively easy to make computational predictions of the targets for known miRNAs. To predict novel miRNA genes, we take advantage of this and reverse the target prediction: instead of predicting targets for known miRNAs, we predict novel miRNA candidates for all known mRNAs. Because matching between target and miRNA candidates is integral to the method, it is possible to achieve good predictions without having to rely on evolutionary conservation, as most other current methods do. This means that we can predict new miRNAs that are specific to an organism. Interestingly, this could help explain the difference between species that have very similar protein-coding genes, but highly different phenotypes. Furthermore, it turns out that many of these new miRNA candidates derive from genomic repeat regions such as transposons, which points to a possible active role for repeats/transposons in the regulation of gene expression.
Collapse
Affiliation(s)
- Morten Lindow
- Bioinformatics Centre, Department of Molecular Biology and Biotech Research and Innovation Centre, University of Copenhagen, Copenhagen, Denmark.
| | | | | | | | | |
Collapse
|
50
|
Abstract
The principal route to understanding the biological significance of the genome sequence comes from discovery and characterization of that portion of the genome that is transcribed into RNA products. We now know that this ;transcriptome' is unexpectedly complex and its precise definition in any one species requires multiple technical approaches and an ability to work on a very large scale. A key step is the development of technologies able to capture snapshots of the complexity of the various kinds of RNA generated by the genome. As the human, mouse and other model genome sequencing projects approach completion, considerable effort has been focused on identifying and annotating the protein-coding genes as the principal output of the genome. In pursuing this aim, several key technologies have been developed to generate large numbers and highly diverse sets of full-length cDNAs and their variants. However, the search has identified another hidden transcriptional universe comprising a wide variety of non-protein coding RNA transcripts. Despite initial scepticism, various experiments and complementary technologies have demonstrated that these RNAs are dynamically transcribed and a subset of them can act as sense-antisense RNAs, which influence the transcriptional output of the genome. Recent experimental evidence suggests that the list of non-protein coding RNAs is still largely incomplete and that transcription is substantially more complex even than currently thought.
Collapse
Affiliation(s)
- Piero Carninci
- Genome Science Laboratory, Discovery and Research Institute, RIKEN Wako Institute, Wako, Saitama, Japan.
| |
Collapse
|