1
|
Abstract
Transcription start site (TSS) selection influences transcript stability and translation as well as protein sequence. Alternative TSS usage is pervasive in organismal development, is a major contributor to transcript isoform diversity in humans, and is frequently observed in human diseases including cancer. In this review, we discuss the breadth of techniques that have been used to globally profile TSSs and the resulting insights into gene regulation, as well as future prospects in this area of inquiry.
Collapse
Affiliation(s)
| | - Gabriel E. Zentner
- Department of Biology, Indiana University, Bloomington, IN 47401, USA
- Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Indianapolis, IN 46202, USA
| |
Collapse
|
2
|
Markus BM, Waldman BS, Lorenzi HA, Lourido S. High-Resolution Mapping of Transcription Initiation in the Asexual Stages of Toxoplasma gondii. Front Cell Infect Microbiol 2021; 10:617998. [PMID: 33553008 PMCID: PMC7854901 DOI: 10.3389/fcimb.2020.617998] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 12/03/2020] [Indexed: 12/13/2022] Open
Abstract
Toxoplasma gondii is a common parasite of humans and animals, causing life-threatening disease in the immunocompromized, fetal abnormalities when contracted during gestation, and recurrent ocular lesions in some patients. Central to the prevalence and pathogenicity of this protozoan is its ability to adapt to a broad range of environments, and to differentiate between acute and chronic stages. These processes are underpinned by a major rewiring of gene expression, yet the mechanisms that regulate transcription in this parasite are only partially characterized. Deciphering these mechanisms requires a precise and comprehensive map of transcription start sites (TSSs); however, Toxoplasma TSSs have remained incompletely defined. To address this challenge, we used 5'-end RNA sequencing to genomically assess transcription initiation in both acute and chronic stages of Toxoplasma. Here, we report an in-depth analysis of transcription initiation at promoters, and provide empirically-defined TSSs for 7603 (91%) protein-coding genes, of which only 1840 concur with existing gene models. Comparing data from acute and chronic stages, we identified instances of stage-specific alternative TSSs that putatively generate mRNA isoforms with distinct 5' termini. Analysis of the nucleotide content and nucleosome occupancy around TSSs allowed us to examine the determinants of TSS choice, and outline features of Toxoplasma promoter architecture. We also found pervasive divergent transcription at Toxoplasma promoters, clustered within the nucleosomes of highly-symmetrical phased arrays, underscoring chromatin contributions to transcription initiation. Corroborating previous observations, we asserted that Toxoplasma 5' leaders are among the longest of any eukaryote studied thus far, displaying a median length of approximately 800 nucleotides. Further highlighting the utility of a precise TSS map, we pinpointed motifs associated with transcription initiation, including the binding sites of the master regulator of chronic-stage differentiation, BFD1, and a novel motif with a similar positional arrangement present at 44% of Toxoplasma promoters. This work provides a critical resource for functional genomics in Toxoplasma, and lays down a foundation to study the interactions between genomic sequences and the regulatory factors that control transcription in this parasite.
Collapse
Affiliation(s)
- Benedikt M. Markus
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Benjamin S. Waldman
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, United States
| | | | - Sebastian Lourido
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, United States
| |
Collapse
|
3
|
Policastro RA, Raborn RT, Brendel VP, Zentner GE. Simple and efficient profiling of transcription initiation and transcript levels with STRIPE-seq. Genome Res 2020; 30:910-923. [PMID: 32660958 PMCID: PMC7370879 DOI: 10.1101/gr.261545.120] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 06/18/2020] [Indexed: 01/07/2023]
Abstract
Accurate mapping of transcription start sites (TSSs) is key for understanding transcriptional regulation. However, current protocols for genome-wide TSS profiling are laborious and/or expensive. We present Survey of TRanscription Initiation at Promoter Elements with high-throughput sequencing (STRIPE-seq), a simple, rapid, and cost-effective protocol for sequencing capped RNA 5' ends from as little as 50 ng total RNA. Including depletion of uncapped RNA and reaction cleanups, a STRIPE-seq library can be constructed in about 5 h. We show application of STRIPE-seq to TSS profiling in yeast and human cells and show that it can also be effectively used for quantification of transcript levels and analysis of differential gene expression. In conjunction with our ready-to-use computational workflows, STRIPE-seq is a straightforward, efficient means by which to probe the landscape of transcriptional initiation.
Collapse
Affiliation(s)
| | | | - Volker P Brendel
- Department of Biology
- Department of Computer Science, Indiana University, Bloomington, Indiana 47405, USA
| | - Gabriel E Zentner
- Department of Biology
- Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Indianapolis, Indiana 46202, USA
| |
Collapse
|
4
|
Suzuki A, Kawano S, Mitsuyama T, Suyama M, Kanai Y, Shirahige K, Sasaki H, Tokunaga K, Tsuchihara K, Sugano S, Nakai K, Suzuki Y. DBTSS/DBKERO for integrated analysis of transcriptional regulation. Nucleic Acids Res 2019; 46:D229-D238. [PMID: 29126224 PMCID: PMC5753362 DOI: 10.1093/nar/gkx1001] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 11/03/2017] [Indexed: 12/15/2022] Open
Abstract
DBTSS (Database of Transcriptional Start Sites)/DBKERO (Database of Kashiwa Encyclopedia for human genome mutations in Regulatory regions and their Omics contexts) is the database originally initiated with the information of transcriptional start sites and their upstream transcriptional regulatory regions. In recent years, we updated the database to assist users to elucidate biological relevance of the human genome variations or somatic mutations in cancers which may affect the transcriptional regulation. In this update, we facilitate interpretations of disease associated genomic variation, using the Japanese population as a model case. We enriched the genomic variation dataset consisting of the 13,368 individuals collected for various genome-wide association studies and the reference epigenome information in the surrounding regions using a total of 455 epigenome datasets (four tissue types from 67 healthy individuals) collected for the International Human Epigenome Consortium (IHEC). The data directly obtained from the clinical samples was associated with that obtained from various model systems, such as the drug perturbation datasets using cultured cancer cells. Furthermore, we incorporated the results obtained using the newly developed analytical methods, Nanopore/10x Genomics long-read sequencing of the human genome and single cell analyses. The database is made publicly accessible at the URL (http://dbtss.hgc.jp/).
Collapse
Affiliation(s)
- Ayako Suzuki
- Division of Translational Genomics, Exploratory Oncology Research and Clinical Trial Center, National Cancer Center, Chiba, Japan
| | - Shin Kawano
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Chiba, Japan
| | - Toutai Mitsuyama
- Computational Regulatory Genomics Research Group, Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Mikita Suyama
- Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Yae Kanai
- Department of Pathology, Keio University School of Medicine, Tokyo, Japan
| | - Katsuhiko Shirahige
- Institute of Molecular and Cellular Biosciences, the University of Tokyo, Tokyo, Japan
| | - Hiroyuki Sasaki
- Division of Epigenomics and Development, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Katsushi Tokunaga
- Department of Human Genetics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
| | - Katsuya Tsuchihara
- Division of Translational Genomics, Exploratory Oncology Research and Clinical Trial Center, National Cancer Center, Chiba, Japan
| | - Sumio Sugano
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, the University of Tokyo, Chiba, Japan
| | - Kenta Nakai
- Human Genome Center, the Institute of Medical Science, the University of Tokyo, Tokyo, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, the University of Tokyo, Chiba, Japan
| |
Collapse
|
5
|
Comprehensive comparative analysis of 5'-end RNA-sequencing methods. Nat Methods 2018; 15:505-511. [PMID: 29867192 PMCID: PMC6075671 DOI: 10.1038/s41592-018-0014-2] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 04/10/2018] [Indexed: 12/20/2022]
Abstract
Specialized RNA-seq methods are required to identify the 5' ends of transcripts, which are critical for studies of gene regulation, but these methods have not been systematically benchmarked. We directly compared six such methods, including the performance of five methods on a single human cellular RNA sample and a new spike-in RNA assay that helps circumvent challenges resulting from uncertainties in annotation and RNA processing. We found that the 'cap analysis of gene expression' (CAGE) method performed best for mRNA and that most of its unannotated peaks were supported by evidence from other genomic methods. We applied CAGE to eight brain-related samples and determined sample-specific transcription start site (TSS) usage, as well as a transcriptome-wide shift in TSS usage between fetal and adult brain.
Collapse
|
6
|
RNA-Sequencing data supports the existence of novel VEGFA splicing events but not of VEGFA xxxb isoforms. Sci Rep 2017; 7:58. [PMID: 28246395 PMCID: PMC5427905 DOI: 10.1038/s41598-017-00100-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 02/06/2017] [Indexed: 01/08/2023] Open
Abstract
Vascular endothelial growth factor (VEGFA), a pivotal regulator of angiogenesis and valuable therapeutic target, is characterised by alternative splicing which generates three principal isoforms, VEGFA121, VEGFA165 and VEGFA189. A second set of anti-angiogenic isoforms termed VEGFAxxxb that utilise an alternative splice site in the final exon have been widely reported, with mRNA detection based principally upon RT-PCR assays. We sought confirmation of the existence of the VEGFAxxxb isoforms within the abundant RNA sequencing data available publicly. Whilst sequences derived specifically from each of the canonical VEGFA isoforms were present in many tissues, there were no sequences derived from VEGFAxxxb isoforms. Sequencing of approximately 50,000 RT-PCR products spanning the exon 7–8 junction in 10 tissues did not identify any VEGFAxxxb transcripts. The absence or extremely low expression of these transcripts in vivo indicates that VEGFAxxxb isoforms are unlikely to play a role in normal physiology. Our analyses also revealed multiple novel splicing events supported by more reads than previously reported for VEGFA145 and VEGFA148 isoforms, including three from novel first exons consistent with existing transcription start site data. These novel VEGFA isoforms may play significant roles in specific cell types.
Collapse
|
7
|
Masvidal L, Hulea L, Furic L, Topisirovic I, Larsson O. mTOR-sensitive translation: Cleared fog reveals more trees. RNA Biol 2017; 14:1299-1305. [PMID: 28277937 PMCID: PMC5711451 DOI: 10.1080/15476286.2017.1290041] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Abstract
Translation is fundamental for many biologic processes as it enables cells to rapidly respond to stimuli without requiring de novo mRNA synthesis. The mammalian/mechanistic target of rapamycin (mTOR) is a key regulator of translation. Although mTOR affects global protein synthesis, translation of a subset of mRNAs appears to be exceptionally sensitive to changes in mTOR activity. Recent efforts to catalog these mTOR-sensitive mRNAs resulted in conflicting results. Whereas ribosome-profiling almost exclusively identified 5'-terminal oligopyrimidine (TOP) mRNAs as mTOR-sensitive, polysome-profiling suggested that mTOR also regulates translation of non-TOP mRNAs. This inconsistency was explained by analytical and technical biases limiting the efficiency of ribosome-profiling in detecting mRNAs showing differential translation. Moreover, genome-wide characterization of 5'UTRs of non-TOP mTOR-sensitive mRNAs revealed 2 subsets of transcripts which differ in their requirement for translation initiation factors and biologic functions. We summarize these recent advances and their impact on the understanding of mTOR-sensitive translation.
Collapse
Affiliation(s)
- Laia Masvidal
- a Department of Oncology-Pathology , Science for Life Laboratory, Karolinska Institutet , Stockholm , Sweden
| | - Laura Hulea
- b Lady Davis Institute, SMBD Jewish General Hospital , Montreal , Canada.,c Gerald-Bronfman Department of Oncology, Departments of Experimental Medicine , and Biochemistry McGill University , Montreal , Canada
| | - Luc Furic
- d Cancer Program , Biomedicine Discovery Institute and Department of Anatomy & Developmental Biology, Monash University , Victoria , Australia.,e Prostate Cancer Translational Research Laboratory, Peter MacCallum Cancer Centre , Melbourne , Victoria , Australia
| | - Ivan Topisirovic
- b Lady Davis Institute, SMBD Jewish General Hospital , Montreal , Canada.,c Gerald-Bronfman Department of Oncology, Departments of Experimental Medicine , and Biochemistry McGill University , Montreal , Canada
| | - Ola Larsson
- a Department of Oncology-Pathology , Science for Life Laboratory, Karolinska Institutet , Stockholm , Sweden
| |
Collapse
|
8
|
Gandin V, Masvidal L, Hulea L, Gravel SP, Cargnello M, McLaughlan S, Cai Y, Balanathan P, Morita M, Rajakumar A, Furic L, Pollak M, Porco JA, St-Pierre J, Pelletier J, Larsson O, Topisirovic I. nanoCAGE reveals 5' UTR features that define specific modes of translation of functionally related MTOR-sensitive mRNAs. Genome Res 2016; 26:636-48. [PMID: 26984228 PMCID: PMC4864462 DOI: 10.1101/gr.197566.115] [Citation(s) in RCA: 146] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 03/14/2016] [Indexed: 12/12/2022]
Abstract
The diversity of MTOR-regulated mRNA translation remains unresolved. Whereas ribosome-profiling suggested that MTOR almost exclusively stimulates translation of the TOP (terminal oligopyrimidine motif) and TOP-like mRNAs, polysome-profiling indicated that MTOR also modulates translation of mRNAs without the 5' TOP motif (non-TOP mRNAs). We demonstrate that in ribosome-profiling studies, detection of MTOR-dependent changes in non-TOP mRNA translation was obscured by low sensitivity and methodology biases. Transcription start site profiling using nano-cap analysis of gene expression (nanoCAGE) revealed that not only do many MTOR-sensitive mRNAs lack the 5' TOP motif but that 5' UTR features distinguish two functionally and translationally distinct subsets of MTOR-sensitive mRNAs: (1) mRNAs with short 5' UTRs enriched for mitochondrial functions, which require EIF4E but are less EIF4A1-sensitive; and (2) long 5' UTR mRNAs encoding proliferation- and survival-promoting proteins, which are both EIF4E- and EIF4A1-sensitive. Selective inhibition of translation of mRNAs harboring long 5' UTRs via EIF4A1 suppression leads to sustained expression of proteins involved in respiration but concomitant loss of those protecting mitochondrial structural integrity, resulting in apoptosis. Conversely, simultaneous suppression of translation of both long and short 5' UTR mRNAs by MTOR inhibitors results in metabolic dormancy and a predominantly cytostatic effect. Thus, 5' UTR features define different modes of MTOR-sensitive translation of functionally distinct subsets of mRNAs, which may explain the diverse impact of MTOR and EIF4A inhibitors on neoplastic cells.
Collapse
Affiliation(s)
- Valentina Gandin
- Lady Davis Institute, SMBD Jewish General Hospital, Montreal, Canada H3T 1E2; Department of Oncology, McGill University, Montreal, Canada H3G 1Y6; Department of Experimental Medicine, McGill University, Montreal, Canada H3G 1Y6; Department of Biochemistry, McGill University, Montreal, Canada H3G 1Y6
| | - Laia Masvidal
- Department of Oncology-Pathology, Science for Life Laboratory, Karolinska Institutet, 171 65 Solna, Sweden
| | - Laura Hulea
- Lady Davis Institute, SMBD Jewish General Hospital, Montreal, Canada H3T 1E2; Department of Oncology, McGill University, Montreal, Canada H3G 1Y6
| | - Simon-Pierre Gravel
- Department of Biochemistry, McGill University, Montreal, Canada H3G 1Y6; Goodman Cancer Research Centre, McGill University, Montreal, Canada H3A 1A3
| | - Marie Cargnello
- Lady Davis Institute, SMBD Jewish General Hospital, Montreal, Canada H3T 1E2; Department of Oncology, McGill University, Montreal, Canada H3G 1Y6
| | - Shannon McLaughlan
- Lady Davis Institute, SMBD Jewish General Hospital, Montreal, Canada H3T 1E2; Department of Oncology, McGill University, Montreal, Canada H3G 1Y6
| | - Yutian Cai
- Lady Davis Institute, SMBD Jewish General Hospital, Montreal, Canada H3T 1E2; Department of Biochemistry, McGill University, Montreal, Canada H3G 1Y6
| | - Preetika Balanathan
- Cancer Program, Biomedicine Discovery Institute and Department of Anatomy and Developmental Biology, Monash University, Victoria 3800, Australia
| | - Masahiro Morita
- Department of Biochemistry, McGill University, Montreal, Canada H3G 1Y6; Goodman Cancer Research Centre, McGill University, Montreal, Canada H3A 1A3
| | - Arjuna Rajakumar
- Lady Davis Institute, SMBD Jewish General Hospital, Montreal, Canada H3T 1E2
| | - Luc Furic
- Cancer Program, Biomedicine Discovery Institute and Department of Anatomy and Developmental Biology, Monash University, Victoria 3800, Australia
| | - Michael Pollak
- Lady Davis Institute, SMBD Jewish General Hospital, Montreal, Canada H3T 1E2; Department of Oncology, McGill University, Montreal, Canada H3G 1Y6; Department of Experimental Medicine, McGill University, Montreal, Canada H3G 1Y6
| | - John A Porco
- Center for Chemical Methodology and Library Development, Boston University, Boston, Massachusetts 02215, USA
| | - Julie St-Pierre
- Department of Biochemistry, McGill University, Montreal, Canada H3G 1Y6; Goodman Cancer Research Centre, McGill University, Montreal, Canada H3A 1A3
| | - Jerry Pelletier
- Department of Oncology, McGill University, Montreal, Canada H3G 1Y6; Department of Biochemistry, McGill University, Montreal, Canada H3G 1Y6; Goodman Cancer Research Centre, McGill University, Montreal, Canada H3A 1A3
| | - Ola Larsson
- Department of Oncology-Pathology, Science for Life Laboratory, Karolinska Institutet, 171 65 Solna, Sweden
| | - Ivan Topisirovic
- Lady Davis Institute, SMBD Jewish General Hospital, Montreal, Canada H3T 1E2; Department of Oncology, McGill University, Montreal, Canada H3G 1Y6; Department of Experimental Medicine, McGill University, Montreal, Canada H3G 1Y6; Department of Biochemistry, McGill University, Montreal, Canada H3G 1Y6
| |
Collapse
|
9
|
Mwangi S, Attardo G, Suzuki Y, Aksoy S, Christoffels A. TSS seq based core promoter architecture in blood feeding Tsetse fly (Glossina morsitans morsitans) vector of Trypanosomiasis. BMC Genomics 2015; 16:722. [PMID: 26394619 PMCID: PMC4578606 DOI: 10.1186/s12864-015-1921-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Accepted: 09/11/2015] [Indexed: 02/02/2023] Open
Abstract
Background Transcription initiation regulation is mediated by sequence-specific interactions between DNA-binding proteins (transcription factors) and cis-elements, where BRE, TATA, INR, DPE and MTE motifs constitute canonical core motifs for basal transcription initiation of genes. Accurate identification of transcription start site (TSS) and their corresponding promoter regions is critical for delineation of these motifs. To this end, the genome scale analysis of core promoter architecture in insects has been confined to Drosophila. The recently sequenced Tsetse fly genome provides a unique opportunity to analyze transcription initiation regulation machinery in blood-feeding insects. Results A computational method for identification of TSS in newly sequenced Tsetse fly genome was evaluated, using TSS seq tags sampled from two developmental stages namely; larvae and pupae. There were 3134 tag clusters among which 45.4 % (1424) of the tag clusters mapped to first coding exons or their proximal predicted 5′UTR regions and 1.0 % (31) tag clusters mapping to transposons, within a threshold of 100 tags per cluster. These 1393 non transposon-derived core promoters had propensity for AT nucleotides. The −1/+1 and 1/+1 positions in D. melanogaster, and G. m. morsitans had propensity for CA and AA dinucleotides respectively. The 1393 tag clusters comprised narrow promoters (5 %), broad with peak promoters (23 %) and broad without peak promoters (72 %). Two-way motif co-occurrence analysis showed that the MTE-DPE pair is over-represented in broad core promoters. The frequently occurring triplet motifs in all promoter classes are the INR-MTE-DPE, TATA-MTE-DPE and TATA-INR-DPE. Promoters without the TATA motif had higher frequency of the MTE and INR motifs than those observed in Drosophila, where the DPE motif occur more frequently in promoters without TATA motif. Gene ontology terms associated with developmental processes were overrepresented in the narrow and broad with peak promoters. Conclusions The study has identified different motif combinations associated with broad promoters in a blood-feeding insect. In the case of TATA-less core promoters, G.m. morsitans uses the MTE to compensate for the lack of a TATA motif. The increasing availability of TSS seq data allows for revision of existing gene annotation datasets with the potential of identifying new transcriptional units. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1921-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sarah Mwangi
- South African MRC Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
| | - Geoffrey Attardo
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, 06510, USA.
| | - Yutaka Suzuki
- Department of Medical Genome Sciences, University of Tokyo, Tokyo, Japan.
| | - Serap Aksoy
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, 06510, USA.
| | - Alan Christoffels
- South African MRC Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
| |
Collapse
|
10
|
Suzuki A, Wakaguri H, Yamashita R, Kawano S, Tsuchihara K, Sugano S, Suzuki Y, Nakai K. DBTSS as an integrative platform for transcriptome, epigenome and genome sequence variation data. Nucleic Acids Res 2014; 43:D87-91. [PMID: 25378318 PMCID: PMC4383915 DOI: 10.1093/nar/gku1080] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
DBTSS (http://dbtss.hgc.jp/) was originally constructed as a collection of uniquely determined transcriptional start sites (TSSs) in humans and some other species in 2002. Since then, it has been regularly updated and in recent updates epigenetic information has also been incorporated because such information is useful for characterizing the biological relevance of these TSSs/downstream genes. In the newest release, Release 9, we further integrated public and original single nucleotide variation (SNV) data into our database. For our original data, we generated SNV data from genomic analyses of various cancer types, including 97 lung adenocarcinomas and 57 lung small cell carcinomas from Japanese patients as well as 26 cell lines of lung cancer origin. In addition, we obtained publically available SNV data from other cancer types and germline variations in total of 11,322 individuals. With these updates, users can examine the association between sequence variation pattern in clinical lung cancers with its corresponding TSS-seq, RNA-seq, ChIP-seq and BS-seq data. Consequently, DBTSS is no longer a mere storage site for TSS information but has evolved into an integrative platform of a variety of genome activity data.
Collapse
Affiliation(s)
- Ayako Suzuki
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Hiroyuki Wakaguri
- Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Riu Yamashita
- Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Shin Kawano
- Database Center for Life Science, Research Organization of Information and Systems, Chiba, Japan
| | - Katsuya Tsuchihara
- Division of TR, The Exploratory Oncology Research and Clinical Trial Center, National Cancer Center, Chiba, Japan
| | - Sumio Sugano
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Yutaka Suzuki
- Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Kenta Nakai
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
11
|
Matsumoto K, Suzuki A, Wakaguri H, Sugano S, Suzuki Y. Construction of mate pair full-length cDNAs libraries and characterization of transcriptional start sites and termination sites. Nucleic Acids Res 2014; 42:e125. [PMID: 25034687 PMCID: PMC4176323 DOI: 10.1093/nar/gku600] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
To identify and characterize transcript structures ranging from transcriptional start sites (TSSs) to poly(A)-addition sites (PASs), we constructed and analyzed human TSS/PAS mate pair full-length cDNA libraries from 14 tissue types and four cell lines. The collected information enabled us to define TSS cluster (TSC) and PAS cluster (PAC) relationships for a total of 8530/9400 RefSeq genes, as well as 4251/5618 of their putative alternative promoters/terminators and 4619/4605 intervening transcripts, respectively. Analyses of the putative alternative TSCs and alternative PACs revealed that their selection appeared to be mostly independent, with rare exceptions. In those exceptional cases, pairs of transcript units rarely overlapped one another and were occasionally separated by Rad21/CTCF. We also identified a total of 172 similar cases in which TSCs and PACs spanned adjacent but distinct genes. In these cases, different transcripts may utilize different functional units of a particular gene or of adjacent genes. This approach was also useful for identifying fusion gene transcripts in cancerous cells. Furthermore, we could construct cDNA libraries in which 3′-end mate pairs were distributed randomly over the transcripts. These libraries were useful for assembling the internal structure of previously uncharacterized alternative promoter products, as well as intervening transcripts.
Collapse
Affiliation(s)
- Kyoko Matsumoto
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
| | - Ayako Suzuki
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
| | - Hiroyuki Wakaguri
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
| | - Sumio Sugano
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
| | - Yutaka Suzuki
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
| |
Collapse
|
12
|
Nishida H, Kondo S, Matsumoto T, Suzuki Y, Yoshikawa H, Taylor TD, Sugiyama J. Characteristics of nucleosomes and linker DNA regions on the genome of the basidiomycete Mixia osmundae revealed by mono- and dinucleosome mapping. Open Biol 2013; 2:120043. [PMID: 22724063 PMCID: PMC3376729 DOI: 10.1098/rsob.120043] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2012] [Accepted: 03/07/2012] [Indexed: 02/02/2023] Open
Abstract
We present findings on the nucleosomal arrangement in the genome of the basidiomycete Mixia osmundae, focusing on nucleosomal linker DNA regions. We have assembled the genomic sequences of M. osmundae, annotated genes and transcription start sites (TSSs) on the genome, and created a detailed nucleosome map based on sequencing mono- and dinucleosomal DNA fragments. The nucleosomal DNA length distribution of M. osmundae is similar to that of the filamentous ascomycete Aspergillus fumigatus, but differs from that of ascomycetous yeasts, strongly suggesting that nucleosome positioning has evolved primarily through neutral drift in fungal species. We found clear association between dinucleotide frequencies and linker DNA regions mapped as the midpoints of dinucleosomes. We also describe a unique pattern found in the nucleosome-depleted region upstream of the TSS observed in the dinucleosome map and the precursor status of dinucleosomes prior to the digestion into mononucleosomes by comparing the mono- and dinucleosome maps. We demonstrate that observation of dinucleosomes as well as of mononucleosomes is valuable in investigating nucleosomal organization of the genome.
Collapse
Affiliation(s)
- Hiromi Nishida
- Agricultural Bioinformatics Research Unit, Graduate School of Agricultural and Life Sciences, University of Tokyo, Tokyo 113-8657, Japan.
| | | | | | | | | | | | | |
Collapse
|
13
|
Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii). PLoS One 2012; 7:e47174. [PMID: 23056606 PMCID: PMC3466250 DOI: 10.1371/journal.pone.0047174] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 09/10/2012] [Indexed: 01/05/2023] Open
Abstract
Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the 'oligo-capping' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5'-ESTs and 41,317 3'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for whole genome sequencing.
Collapse
|
14
|
Full-Length Enrich c-DNA Libraries-Clear Cell-Renal Cell Carcinoma. JOURNAL OF ONCOLOGY 2012; 2012:680796. [PMID: 22545051 PMCID: PMC3321460 DOI: 10.1155/2012/680796] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Accepted: 11/22/2011] [Indexed: 02/07/2023]
Abstract
Clear cell renal cell carcinoma (ccRCC), the most common subtype of RCC, is characterized by high metastasis potential and strong resistance to traditional therapies, resulting in a poor five-year survival rate of patients. Several therapies targeted to VEGF pathway for advanced RCC have been developed, however, it still needs to discover new therapeutic targets for treating RCC. Genome-wide gene expression analyses have been broadly used to identify unknown molecular mechanisms of cancer progression. Recently, we applied the oligo-capping method to construct the full-length cDNA libraries of ccRCC and adjacent normal kidney, and analyzed the gene expression profiles by high-throughput sequencing. This paper presents a review for recent findings on therapeutic potential of MYC pathway and nicotinamide N-methyltransferase for the treatment of RCC.
Collapse
|
15
|
Renfree MB, Papenfuss AT, Deakin JE, Lindsay J, Heider T, Belov K, Rens W, Waters PD, Pharo EA, Shaw G, Wong ESW, Lefèvre CM, Nicholas KR, Kuroki Y, Wakefield MJ, Zenger KR, Wang C, Ferguson-Smith M, Nicholas FW, Hickford D, Yu H, Short KR, Siddle HV, Frankenberg SR, Chew KY, Menzies BR, Stringer JM, Suzuki S, Hore TA, Delbridge ML, Mohammadi A, Schneider NY, Hu Y, O'Hara W, Al Nadaf S, Wu C, Feng ZP, Cocks BG, Wang J, Flicek P, Searle SMJ, Fairley S, Beal K, Herrero J, Carone DM, Suzuki Y, Sugano S, Toyoda A, Sakaki Y, Kondo S, Nishida Y, Tatsumoto S, Mandiou I, Hsu A, McColl KA, Lansdell B, Weinstock G, Kuczek E, McGrath A, Wilson P, Men A, Hazar-Rethinam M, Hall A, Davis J, Wood D, Williams S, Sundaravadanam Y, Muzny DM, Jhangiani SN, Lewis LR, Morgan MB, Okwuonu GO, Ruiz SJ, Santibanez J, Nazareth L, Cree A, Fowler G, Kovar CL, Dinh HH, Joshi V, Jing C, Lara F, Thornton R, Chen L, Deng J, Liu Y, Shen JY, Song XZ, Edson J, Troon C, Thomas D, Stephens A, Yapa L, Levchenko T, Gibbs RA, Cooper DW, Speed TP, Fujiyama A, M Graves JA, O'Neill RJ, Pask AJ, Forrest SM, Worley KC. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development. Genome Biol 2011; 12:R81. [PMID: 21854559 PMCID: PMC3277949 DOI: 10.1186/gb-2011-12-8-r81] [Citation(s) in RCA: 147] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2011] [Revised: 07/22/2011] [Accepted: 08/19/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development. RESULTS The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements. CONCLUSIONS Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution.
Collapse
Affiliation(s)
- Marilyn B Renfree
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Department of Zoology, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Anthony T Papenfuss
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
- Department of Mathematics and Statistics, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Janine E Deakin
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Research School of Biology, The Australian National University, Canberra, ACT 0200, Australia
| | - James Lindsay
- Department of Molecular and Cell Biology, Center for Applied Genetics and Technology, University of Connecticut, Storrs, CT 06269, USA
| | - Thomas Heider
- Department of Molecular and Cell Biology, Center for Applied Genetics and Technology, University of Connecticut, Storrs, CT 06269, USA
| | - Katherine Belov
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW 2006, Australia
| | - Willem Rens
- Department of Veterinary Medicine, University of Cambridge, Madingley Rd, Cambridge, CB3 0ES, UK
| | - Paul D Waters
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Research School of Biology, The Australian National University, Canberra, ACT 0200, Australia
| | - Elizabeth A Pharo
- Department of Zoology, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Geoff Shaw
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Department of Zoology, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Emily SW Wong
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW 2006, Australia
| | - Christophe M Lefèvre
- Institute for Technology Research and Innovation, Deakin University, Geelong, Victoria, 3214, Australia
| | - Kevin R Nicholas
- Institute for Technology Research and Innovation, Deakin University, Geelong, Victoria, 3214, Australia
| | - Yoko Kuroki
- RIKEN Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Matthew J Wakefield
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
| | - Kyall R Zenger
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW 2006, Australia
- School of Marine and Tropical Biology, James Cook University, Townsville, Queensland 4811, Australia
| | - Chenwei Wang
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW 2006, Australia
| | - Malcolm Ferguson-Smith
- Department of Veterinary Medicine, University of Cambridge, Madingley Rd, Cambridge, CB3 0ES, UK
| | - Frank W Nicholas
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW 2006, Australia
| | - Danielle Hickford
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Department of Zoology, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Hongshi Yu
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Department of Zoology, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Kirsty R Short
- Department of Microbiology and Immunology, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Hannah V Siddle
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW 2006, Australia
| | - Stephen R Frankenberg
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Department of Zoology, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Keng Yih Chew
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Department of Zoology, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Brandon R Menzies
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Department of Zoology, The University of Melbourne, Melbourne, Victoria 3010, Australia
- Leibniz Institute for Zoo and Wildlife Research, Alfred-Kowalke-Str. 17, Berlin 10315, Germany
| | - Jessica M Stringer
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Department of Zoology, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Shunsuke Suzuki
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Department of Zoology, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Timothy A Hore
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Laboratory of Developmental Genetics and Imprinting, The Babraham Institute, Cambridge, CB22 3AT, UK
| | - Margaret L Delbridge
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Research School of Biology, The Australian National University, Canberra, ACT 0200, Australia
| | - Amir Mohammadi
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Research School of Biology, The Australian National University, Canberra, ACT 0200, Australia
| | - Nanette Y Schneider
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Department of Zoology, The University of Melbourne, Melbourne, Victoria 3010, Australia
- Department of Molecular Genetics, German Institute of Human Nutrition, Potsdam-Rehbruecke, Arthur-Scheunert-Allee 114-116, 14558 Nuthetal, Germany
| | - Yanqiu Hu
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Department of Zoology, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - William O'Hara
- Department of Molecular and Cell Biology, Center for Applied Genetics and Technology, University of Connecticut, Storrs, CT 06269, USA
| | - Shafagh Al Nadaf
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Research School of Biology, The Australian National University, Canberra, ACT 0200, Australia
| | - Chen Wu
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW 2006, Australia
| | - Zhi-Ping Feng
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Benjamin G Cocks
- Biosciences Research Division, Department of Primary Industries, Victoria, 1 Park Drive, Bundoora 3083, Australia
| | - Jianghui Wang
- Biosciences Research Division, Department of Primary Industries, Victoria, 1 Park Drive, Bundoora 3083, Australia
| | - Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Stephen MJ Searle
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Susan Fairley
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Kathryn Beal
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Javier Herrero
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Dawn M Carone
- Department of Molecular and Cell Biology, Center for Applied Genetics and Technology, University of Connecticut, Storrs, CT 06269, USA
- Department of Cell Biology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Yutaka Suzuki
- Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8560, Japan
| | - Sumio Sugano
- Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8560, Japan
| | - Atsushi Toyoda
- National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yoshiyuki Sakaki
- RIKEN Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Shinji Kondo
- RIKEN Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Yuichiro Nishida
- RIKEN Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Shoji Tatsumoto
- RIKEN Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Ion Mandiou
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
| | - Arthur Hsu
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Kaighin A McColl
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
| | - Benjamin Lansdell
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
| | - George Weinstock
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Elizabeth Kuczek
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
- Westmead Institute for Cancer Research, University of Sydney, Westmead, New South Wales 2145, Australia
| | - Annette McGrath
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Peter Wilson
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Artem Men
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Mehlika Hazar-Rethinam
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Allison Hall
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - John Davis
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - David Wood
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Sarah Williams
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Yogi Sundaravadanam
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Donna M Muzny
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Shalini N Jhangiani
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Lora R Lewis
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Margaret B Morgan
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Geoffrey O Okwuonu
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - San Juana Ruiz
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Jireh Santibanez
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Lynne Nazareth
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Andrew Cree
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Gerald Fowler
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Christie L Kovar
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Huyen H Dinh
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Vandita Joshi
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Chyn Jing
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Fremiet Lara
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Rebecca Thornton
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Lei Chen
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Jixin Deng
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Yue Liu
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Joshua Y Shen
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Xing-Zhi Song
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Janette Edson
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Carmen Troon
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Daniel Thomas
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Amber Stephens
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Lankesha Yapa
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Tanya Levchenko
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Richard A Gibbs
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| | - Desmond W Cooper
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Department of Biological, Earth and Environmental Sciences, The University of New South Wales, Sydney, NSW 2052, Australia
| | - Terence P Speed
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia
| | - Asao Fujiyama
- National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
- National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
| | - Jennifer A M Graves
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Research School of Biology, The Australian National University, Canberra, ACT 0200, Australia
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, Center for Applied Genetics and Technology, University of Connecticut, Storrs, CT 06269, USA
| | - Andrew J Pask
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Department of Zoology, The University of Melbourne, Melbourne, Victoria 3010, Australia
- Department of Molecular and Cell Biology, Center for Applied Genetics and Technology, University of Connecticut, Storrs, CT 06269, USA
| | - Susan M Forrest
- The Australian Research Council Centre of Excellence in Kangaroo Genomics, Australia
- Australian Genome Research Facility, Melbourne, Victoria, 3052 and the University of Queensland, St Lucia, Queensland 4072, Australia
| | - Kim C Worley
- Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
16
|
Kanai A, Suzuki K, Tanimoto K, Mizushima-Sugano J, Suzuki Y, Sugano S. Characterization of STAT6 target genes in human B cells and lung epithelial cells. DNA Res 2011; 18:379-92. [PMID: 21828071 PMCID: PMC3190958 DOI: 10.1093/dnares/dsr025] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Using ChIP Seq, we identified 556 and 467 putative STAT6 target sites in the Burkitt's lymphoma cell line Ramos and in the normal lung epithelial cell line BEAS2B, respectively. We also examined the positions and expression of transcriptional start sites (TSSs) in these cells using our TSS Seq method. We observed that 44 and 132 genes in Ramos and BEAS2B, respectively, had STAT6 binding sites in proximal regions of their previously reported TSSs that were up-regulated at the transcriptional level. In addition, 406 and 109 of the STAT6 target sites in Ramos and BEAS2B, respectively, were located in proximal regions of previously uncharacterized TSSs. The target genes identified in Ramos and BEAS2B cells in this study and in Th2 cells in previous studies rarely overlapped and differed in their identity. Interestingly, ChIP Seq analyses of histone modifications and RNA polymerase II revealed that chromatin formed an active structure in regions surrounding the STAT6 binding sites; this event also frequently occurred in different cell types, although neither STAT6 binding nor TSS induction was observed. The rough landscape of STAT6-responsive sites was found to be shaped by chromatin structure, but distinct cellular responses were mainly mediated by distinct sets of transcription factors.
Collapse
Affiliation(s)
- Akinori Kanai
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwashi, Chiba 277-8562, Japan
| | | | | | | | | | | |
Collapse
|
17
|
Abstract
Background In the genome era, characterizing the structure and the function of RNA molecules remains a major challenge. Alternative transcripts and non-protein-coding genes are poorly recognized by the current genome-annotation algorithms and efficient tools are needed to isolate the less-abundant or stable RNAs. Results A universal RNA-tagging method using the T4 RNA ligase 2 and special adapters is reported. Based on this system, protocols for RACE PCR and full-length cDNA library construction have been developed. The RNA tagging conditions were thoroughly optimized and compared to previous methods by using a biochemical oligonucleotide tagging assay and RACE PCRs on a range of transcripts. In addition, two large-scale full-length cDNA inventories relying on this method are presented. Conclusion The RNA Captor is a straightforward and accessible protocol. The sensitivity of this approach was shown to be higher compared to previous methods, and applicable on messenger RNAs, non-protein-coding RNAs, transcription-start sites and microRNA-directed cleavage sites of transcripts. This strategy could also be used to study other classes of RNA and in deep sequencing experiments.
Collapse
Affiliation(s)
- Christian Clepet
- URGV Plant Genomics, INRA UMR1165 UEVE/CNRS ERL 8196, Evry, France.
| |
Collapse
|
18
|
Irie T, Park SJ, Yamashita R, Seki M, Yada T, Sugano S, Nakai K, Suzuki Y. Predicting promoter activities of primary human DNA sequences. Nucleic Acids Res 2011; 39:e75. [PMID: 21486745 PMCID: PMC3113590 DOI: 10.1093/nar/gkr173] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
We developed a computer program that can predict the intrinsic promoter activities of primary human DNA sequences. We observed promoter activity using a quantitative luciferase assay and generated a prediction model using multiple linear regression. Our program achieved a prediction accuracy correlation coefficient of 0.87 between the predicted and observed promoter activities. We evaluated the prediction accuracy of the program using massive sequencing analysis of transcriptional start sites in vivo. We found that it is still difficult to predict transcript levels in a strictly quantitative manner in vivo; however, it was possible to select active promoters in a given cell from the other silent promoters. Using this program, we analyzed the transcriptional landscape of the entire human genome. We demonstrate that many human genomic regions have potential promoter activity, and the expression of some previously uncharacterized putatively non-protein-coding transcripts can be explained by our prediction model. Furthermore, we found that nucleosomes occasionally formed open chromatin structures with RNA polymerase II recruitment where the program predicted significant promoter activities, although no transcripts were observed.
Collapse
Affiliation(s)
- Takuma Irie
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo, 5-1-5 Kashiwanoha, Kashiwashi, Chiba 277-8562, Japan
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Yamashita R, Sathira NP, Kanai A, Tanimoto K, Arauchi T, Tanaka Y, Hashimoto SI, Sugano S, Nakai K, Suzuki Y. Genome-wide characterization of transcriptional start sites in humans by integrative transcriptome analysis. Genome Res 2011; 21:775-89. [PMID: 21372179 DOI: 10.1101/gr.110254.110] [Citation(s) in RCA: 100] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
We performed a genome-wide analysis of transcriptional start sites (TSSs) in human genes by multifaceted use of a massively parallel sequencer. By analyzing 800 million sequences that were obtained from various types of transcriptome analyses, we characterized 140 million TSS tags in 12 human cell types. Despite the large number of TSS clusters (TSCs), the number of TSCs was observed to decrease sharply with increasing expression levels. Highly expressed TSCs exhibited several characteristic features: Nucleosome-seq analysis revealed highly ordered nucleosome structures, ChIP-seq analysis detected clear RNA polymerase II binding signals in their surrounding regions, evaluations of previously sequenced and newly shotgun-sequenced complete cDNA sequences showed that they encode preferable transcripts for protein translation, and RNA-seq analysis of polysome-incorporated RNAs yielded direct evidence that those transcripts are actually translated into proteins. We also demonstrate that integrative interpretation of transcriptome data is essential for the selection of putative alternative promoter TSCs, two of which also have protein consequences. Furthermore, discriminative chromatin features that separate TSCs at different expression levels were found for both genic TSCs and intergenic TSCs. The collected integrative information should provide a useful basis for future biological characterization of TSCs.
Collapse
Affiliation(s)
- Riu Yamashita
- Frontier Research Initiative, Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Sathirapongsasuti JF, Sathira N, Suzuki Y, Huttenhower C, Sugano S. Ultraconserved cDNA segments in the human transcriptome exhibit resistance to folding and implicate function in translation and alternative splicing. Nucleic Acids Res 2010; 39:1967-79. [PMID: 21062826 PMCID: PMC3064809 DOI: 10.1093/nar/gkq949] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Ultraconservation, defined as perfect human-to-rodent sequence identity at least 200-bp long, is a strong indicator of evolutionary and functional importance and has been explored extensively at the genome level. However, it has not been investigated at the transcript level, where such extreme conservation might highlight loci with important post-transcriptional regulatory roles. We present 96 ultraconserved cDNA segments (UCSs), stretches of human mature mRNAs that match identically with orthologous regions in the mouse and rat genomes. UCSs can span multiple exons, a feature we leverage here to elucidate the role of ultraconservation in post-transcriptional regulation. UCS sites are implicated in functions at essentially every post-transcriptional stage: pre-mRNA splicing and degradation through alternative splicing and nonsense-mediated decay (AS-NMD), mature mRNA silencing by miRNA, fast mRNA decay rate and translational repression by upstream AUGs. We also found UCSs to exhibit resistance to formation of RNA secondary structure. These multiple layers of regulation underscore the importance of the UCS-containing genes as key global RNA processing regulators, including members of the serine/arginine-rich protein and heterogeneous nuclear ribonucleoprotein (hnRNP) families of essential splicing regulators. The discovery of UCSs shed new light on the multifaceted, fine-tuned and tight post-transcriptional regulation of gene families as conserved through the majority of the mammalian lineage.
Collapse
Affiliation(s)
- J Fah Sathirapongsasuti
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan.
| | | | | | | | | |
Collapse
|
21
|
Tuda J, Mongan AE, Tolba MEM, Imada M, Yamagishi J, Xuan X, Wakaguri H, Sugano S, Sugimoto C, Suzuki Y. Full-parasites: database of full-length cDNAs of apicomplexa parasites, 2010 update. Nucleic Acids Res 2010; 39:D625-31. [PMID: 21051343 PMCID: PMC3013703 DOI: 10.1093/nar/gkq1111] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Full-Parasites (http://fullmal.hgc.jp/) is a transcriptome database of apicomplexa parasites, which include Plasmodium and Toxoplasma species. The latest version of Full-Parasites contains a total of 105 786 EST sequences from 12 parasites, of which 5925 full-length cDNAs have been completely sequenced. Full-Parasites also contain more than 30 million transcription start sites (TSS) for Plasmodium falciparum (Pf) and Toxoplasma gondii (Tg), which were identified using our novel oligo-capping-based protocol. Various types of cDNA data resources were interconnected with our original database functionalities. Specifically, in this update, we have included two unique RNA-Seq data sets consisting of 730 million mapped RNA-Seq tags. One is a dataset of 16 time-lapse experiments of cultured bradyzoite differentiation for Tg. The other dataset includes 31 clinical samples of Pf. Parasite RNA was extracted together with host human RNA, and the extracted mixed RNA was used for RNA sequencing, with the expectation that gene expression information from the host and parasite would be simultaneously represented. By providing the largest unique full-length cDNA and dynamic transcriptome data, Full-Parasites is useful for understanding host–parasite interactions and will help to eventually elucidate how monophyletic organisms have evolved to become parasites by adopting complex life cycles.
Collapse
Affiliation(s)
- Josef Tuda
- Faculty of Medicine, Sam Ratulangi University, Kampus Unsrat, Bahu Manado 95115, Indonesia
| | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
|
23
|
Yamagishi J, Wakaguri H, Ueno A, Goo YK, Tolba M, Igarashi M, Nishikawa Y, Sugimoto C, Sugano S, Suzuki Y, Watanabe J, Xuan X. High-resolution characterization of Toxoplasma gondii transcriptome with a massive parallel sequencing method. DNA Res 2010; 17:233-43. [PMID: 20522451 PMCID: PMC2920756 DOI: 10.1093/dnares/dsq013] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
For the last couple of years, a method that permits the collection of precise positional information of transcriptional start sites (TSSs) together with digital information of the gene-expression levels in a high-throughput manner was established. We applied this novel method, ‘tss-seq’, to elucidate the transcriptome of tachyzoites of the Toxoplasma gondii, which resulted in the identification of 124 000 TSSs, and they were clustered into 10 000 transcription regions (TRs) with a statistics-based analysis. The TRs and annotated ORFs were paired, resulting in the identification of 30% of the TRs and 40% of the ORFs without their counterparts, which predicted undiscovered genes and stage-specific transcriptions, respectively. The massive data for TSSs make it possible to execute the first systematic analysis of the T. gondii core promoter structure, and the information showed that T. gondii utilized an initiator-like motif for their transcription in the major and novel motif, the downstream thymidine cluster, which was similar to the Y patch observed in plants. This encyclopaedic analysis also suggested that the TATA box, and the other well-known core promoter elements were hardly utilized.
Collapse
Affiliation(s)
- Junya Yamagishi
- 1National Research Center for Protozoan Diseases, Obihiro University of Agriculture and Veterinary Medicine, Obihiro, Japan
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Ni T, Corcoran DL, Rach EA, Song S, Spana EP, Gao Y, Ohler U, Zhu J. A paired-end sequencing strategy to map the complex landscape of transcription initiation. Nat Methods 2010; 7:521-7. [PMID: 20495556 PMCID: PMC3197272 DOI: 10.1038/nmeth.1464] [Citation(s) in RCA: 119] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2009] [Accepted: 04/21/2010] [Indexed: 01/06/2023]
Abstract
Recent high-throughput sequencing protocols have uncovered the complexity of mammalian transcription by RNA polymerase II, helping to define several initiation patterns in which transcription start sites (TSSs) cluster within both narrow and broad genomic windows. Here, we describe a paired-end sequencing strategy, which enables more robust mapping and characterization of capped transcripts. This strategy was applied to explore the transcription initiation landscape in the Drosophila melanogaster embryo. Extending the previous findings in mammals, we found that fly promoters exhibit distinct initiation patterns, which are linked to specific promoter sequence motifs. Furthermore, we identified a large number of 5′ capped transcripts originating from coding exons; analyses support that they are unlikely the result of alternative TSSs, but rather the product of post-transcriptional modifications. Taken together, paired-end TSS analysis is demonstrated to be a powerful method to uncover the transcriptional complexity of eukaryotic genomes.
Collapse
Affiliation(s)
- Ting Ni
- Institute for Genome Sciences and Policy, Duke University Medical Center, Durham, North Carolina, USA
| | | | | | | | | | | | | | | |
Collapse
|
25
|
Sathira N, Yamashita R, Tanimoto K, Kanai A, Arauchi T, Kanematsu S, Nakai K, Suzuki Y, Sugano S. Characterization of transcription start sites of putative non-coding RNAs by multifaceted use of massively paralleled sequencer. DNA Res 2010; 17:169-83. [PMID: 20400770 PMCID: PMC2885271 DOI: 10.1093/dnares/dsq007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
On the basis of integrated transcriptome analysis, we show that not all transcriptional start site clusters (TSCs) in the intergenic regions (iTSCs) have the same properties; thus, it is possible to discriminate the iTSCs that are likely to have biological relevance from the other noise-level iTSCs. We used a total of 251 933 381 short-read sequence tags generated from various types of transcriptome analyses in order to characterize 6039 iTSCs, which have significant expression levels. We analyzed and found that 23% of these iTSCs were located in the proximal regions of the RefSeq genes. These RefSeq-linked iTSCs showed similar expression patterns with the neighboring RefSeq genes, had widely fluctuating transcription start sites and lacked ordered nucleosome positioning. These iTSCs seemed not to form independent transcriptional units, simply representing the by-products of the neighboring RefSeq genes, in spite of their significant expression levels. Similar features were also observed for the TSCs located in the antisense regions of the RefSeq genes. Furthermore, for the remaining iTSCs that were not associated with any RefSeq genes, we demonstrate that integrative interpretation of the transcriptome data provides essential information to specify their biological functions in the hypoxic responses of the cells.
Collapse
Affiliation(s)
- Nuankanya Sathira
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba 277-8568, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Nishida H, Motoyama T, Suzuki Y, Yamamoto S, Aburatani H, Osada H. Genome-wide maps of mononucleosomes and dinucleosomes containing hyperacetylated histones of Aspergillus fumigatus. PLoS One 2010; 5:e9916. [PMID: 20361043 PMCID: PMC2845647 DOI: 10.1371/journal.pone.0009916] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2010] [Accepted: 03/04/2010] [Indexed: 11/19/2022] Open
Abstract
It is suggested that histone modifications and/or histone variants influence the nucleosomal DNA length. We sequenced both ends of mononucleosomal and dinucleosomal DNA fragments of the filamentous fungus Aspergillus fumigatus, after treatment with the histone deacetylase inhibitor trichostatin A (TSA). After mapping the DNA fragments to the genome, we identified >7 million mononucleosome positions and >7 million dinucleosome positions. We showed that the distributions of the lengths of the mononucleosomal DNA fragments after 15-min and 30-min treatments with micrococcal nuclease (MNase) showed a single peak at 168 nt and 160 nt, respectively. The distributions of the lengths of the dinucleosomal DNA fragments after 15-min- and 30-min-treatment with MNase showed a single peak at 321 nt and 306 nt, respectively. The nucleosomal DNA fragments obtained from the TSA-treated cells were significantly longer than those obtained from the untreated cells. On the other hand, most of the genes did not undergo any change after treatment. Between the TSA-treated and untreated cells, only 77 genes had >or=2-fold change in expression levels. In addition, our results showed that the locations where mononucleosomes were frequently detected were conserved between the TSA-treated cells and untreated cells in the gene promoters (lower density of the nucleosomes). However, these locations were less conserved in the bodies (higher density of the nucleosomes) of genes with >or=2-fold changes. Our findings indicate that TSA influences the nucleosome positions, especially of the regions with high density of the nucleosomes by elongation of the nucleosomal DNA. However, most of the nucleosome positions are conserved in the gene promoters, even after treatment with TSA, because of the low density of nucleosomes in the gene promoters.
Collapse
Affiliation(s)
- Hiromi Nishida
- Agricultural Bioinformatics Research Unit, Graduate School of Agricultural and Life Sciences, University of Tokyo, Tokyo, Japan.
| | | | | | | | | | | |
Collapse
|
27
|
Yang JO, Kim WY, Jeong SY, Oh JH, Jho S, Bhak J, Kim NS. PDbase: a database of Parkinson's disease-related genes and genetic variation using substantia nigra ESTs. BMC Genomics 2009; 10 Suppl 3:S32. [PMID: 19958497 PMCID: PMC2788386 DOI: 10.1186/1471-2164-10-s3-s32] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Background Parkinson's disease (PD) is one of the most common neurodegenerative disorders, clinically characterized by impaired motor function. Since the etiology of PD is diverse and complex, many researchers have created PD-related research resources. However, resources for brain and PD studies are still lacking. Therefore, we have constructed a database of PD-related gene and genetic variations using the substantia nigra (SN) in PD and normal tissues. In addition, we integrated PD-related information from several resources. Results We collected the 6,130 SN expressed sequenced tags (ESTs) from brain SN normal tissues and PD patients SN tissues using full-cDNA library and normalized cDNA library construction methods from our previous study. The SN ESTs were clustered in 2,951 unigene clusters and assigned in 2,678 genes. We then found up-regulated 57 genes and down-regulated 48 genes by comparing normal and PD SN ESTs frequencies with over 0.9 cut-off probability of differential expression based on the Audic and Claverie method. In addition, we integrated disease-related information from public resources. To examine the characteristics of these PD-related genes, we analyzed alternative splicing events, single nucleotide polymorphism (SNP) markers located in the gene regions, repeat elements, gene regulation elements, and pathways and protein-protein interaction networks. Conclusion We constructed the PDbase database to capture the PD-related gene, genetic variation, and functional elements. This database contains 2,698 PD-related genes through ESTs discovered from human normal and PD patients SN tissues, and through integrating several public resources. PDbase provides the mitochondrion proteins, microRNA gene regulation elements, single nucleotide polymorphisms (SNPs) markers within PD-related gene structures, repeat elements, and pathways and networks with protein-protein interaction information. The PDbase information can aid in understanding the causation of PD. It is available at http://bioportal.kobic.re.kr/PDbase/. Supplementary data is available at http://bioportal.kobic.re.kr/PDbase/suppl.jsp
Collapse
Affiliation(s)
- Jin Ok Yang
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), 111 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea.
| | | | | | | | | | | | | |
Collapse
|
28
|
Yamashita R, Wakaguri H, Sugano S, Suzuki Y, Nakai K. DBTSS provides a tissue specific dynamic view of Transcription Start Sites. Nucleic Acids Res 2009; 38:D98-104. [PMID: 19910371 PMCID: PMC2808897 DOI: 10.1093/nar/gkp1017] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
DataBase of Transcription Start Sites (DBTSS) is a database which contains precise positional information for transcription start sites (TSSs) of eukaryotic mRNAs. In this update, we included 330 million new tags generated by massively sequencing the 5′-end of oligo-cap selected cDNAs in humans and mice. The tags were collected from normal fetal or adult human tissues, including brain, thymus, liver, kidney and heart, from 6 human cell lines in 21 diverse growth conditions as well as from mouse NIH3T3 cell line: altogether 31 different cell types or culture conditions are represented. This unprecedented increase in depth of data now allows DBTSS to faithfully represent the dynamically changing landscape of TSSs in different cell types and conditions, during development and in the course of evolution. Differential usage of alternative 5′-ends across cell types and conditions can be viewed in a series of new interfaces. Promoter sequence information is now displayed in a comparative genomics viewer where evolutionary turnover of the TSSs can be evaluated. DBTSS can be accessed at http://dbtss.hgc.jp/.
Collapse
Affiliation(s)
- Riu Yamashita
- Frontier Research Initiative, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | | | | | | | | |
Collapse
|
29
|
Wakamatsu A, Kimura K, Yamamoto JI, Nishikawa T, Nomura N, Sugano S, Isogai T. Identification and functional analyses of 11,769 full-length human cDNAs focused on alternative splicing. DNA Res 2009; 16:371-83. [PMID: 19880432 PMCID: PMC2780955 DOI: 10.1093/dnares/dsp022] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
We analyzed diversity of mRNA produced as a result of alternative splicing in order to evaluate gene function. First, we predicted the number of human genes transcribed into protein-coding mRNAs by using the sequence information of full-length cDNAs and 5′-ESTs and obtained 23 241 of such human genes. Next, using these genes, we analyzed the mRNA diversity and consequently sequenced and identified 11 769 human full-length cDNAs whose predicted open reading frames were different from other known full-length cDNAs. Especially, 30% of the cDNAs we identified contained variation in the transcription start site (TSS). Our analysis, which particularly focused on multiple variable first exons (FEVs) formed due to the alternative utilization of TSSs, led to the identification of 261 FEVs expressed in the tissue-specific manner. Quantification of the expression profiles of 13 genes by real-time PCR analysis further confirmed the tissue-specific expression of FEVs, e.g. OXR1 had specific TSS in brain and tumor tissues, and so on. Finally, based on the results of our mRNA diversity analysis, we have created the FLJ Human cDNA Database. From our result, it has been understood mechanisms that one gene produces suitable protein-coding transcripts responding to the situation and the environment.
Collapse
Affiliation(s)
- Ai Wakamatsu
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Kouichi Kimura
- Central Research Laboratory, Hitachi, Ltd, Kokubunji, Tokyo 185-8601, Japan
| | - Jun-ichi Yamamoto
- Reverse Proteomics Research Institute, 1-9-11 Kaji, Chiyoda-ku, Tokyo 101-0044, Japan
| | - Tetsuo Nishikawa
- Reverse Proteomics Research Institute, 1-9-11 Kaji, Chiyoda-ku, Tokyo 101-0044, Japan
| | - Nobuo Nomura
- National Institute of Advanced Industrial Science and Technology, 2-41-6 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Sumio Sugano
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 4-6-1 Shiroganedai, Minato-ku, Tokyo 108-8639, Japan
| | - Takao Isogai
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
- Reverse Proteomics Research Institute, 1-9-11 Kaji, Chiyoda-ku, Tokyo 101-0044, Japan
- Corresponding author. E-mail:
| |
Collapse
|
30
|
Marques MC, Alonso-Cantabrana H, Forment J, Arribas R, Alamar S, Conejero V, Perez-Amador MA. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus. BMC Genomics 2009; 10:428. [PMID: 19747386 PMCID: PMC2754500 DOI: 10.1186/1471-2164-10-428] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2009] [Accepted: 09/11/2009] [Indexed: 01/02/2023] Open
Abstract
Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an important step towards the identification of all genes in the citrus genome. Furthermore, public availability of the cDNA clones generated in this study, and not only their sequence, enables testing of the biological function of the genes represented in the collection. Expression of the citrus SEP3 homologue, CitrSEP, in Arabidopsis results in early flowering, along with other phenotypes resembling the over-expression of the Arabidopsis SEPALLATA genes. Our findings suggest that the members of the SEP gene family play similar roles in these quite distant plant species.
Collapse
Affiliation(s)
- M Carmen Marques
- Instituto de Biología Molecular y Celular de Plantas, Universidad Politécnica de Valencia and Consejo Superior de Investigaciones Científicas, Avenida de los Naranjos s/n, Valencia 46022, Spain.
| | | | | | | | | | | | | |
Collapse
|
31
|
Wakaguri H, Suzuki Y, Sasaki M, Sugano S, Watanabe J. Inconsistencies of genome annotations in apicomplexan parasites revealed by 5'-end-one-pass and full-length sequences of oligo-capped cDNAs. BMC Genomics 2009; 10:312. [PMID: 19602295 PMCID: PMC2722674 DOI: 10.1186/1471-2164-10-312] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2008] [Accepted: 07/15/2009] [Indexed: 11/21/2022] Open
Abstract
Background Apicomplexan parasites are causative agents of various diseases including malaria and have been targets of extensive genomic sequencing. We generated 5'-EST collections for six apicomplexa parasites using our full-length oligo-capping cDNA library method. To improve upon the current genome annotations, as well as to validate the importance for physical cDNA clone resources, we generated a large-scale collection of full-length cDNAs for several apicomplexa parasites. Results In this study, we used a total of 61,056 5'-end-single-pass cDNA sequences from Plasmodium falciparum, P. vivax, P. yoelii, P. berghei, Cryptosporidium parvum, and Toxoplasma gondii. We compared these partially sequenced cDNA sequences with the currently annotated gene models and observed significant inconsistencies between the two datasets. In particular, we found that on average 14% of the exons in the current gene models were not supported by any cDNA evidence, and that 16% of the current gene models may contain at least one mis-annotation and should be re-evaluated. We also identified a large number of transcripts that had been previously unidentified. For 732 cDNAs in T. gondii, the entire sequences were determined in order to evaluate the annotated gene models at the complete full-length transcript level. We found that 41% of the T. gondii gene models contained at least one inconsistency. We also identified and confirmed by RT-PCR 140 previously unidentified transcripts found in the intergenic regions of the current gene annotations. We show that the majority of these discrepancies are due to questionable predictions of one or two extra exons in the upstream or downstream regions of the genes. Conclusion Our data indicates that the current gene models are likely to still be incomplete and have much room for improvement. Our unique full-length cDNA information is especially useful for further refinement of the annotations for the genomes of apicomplexa parasites.
Collapse
Affiliation(s)
- Hiroyuki Wakaguri
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwanoha, Kashiwa, Chiba, Japan.
| | | | | | | | | |
Collapse
|
32
|
Tsuchihara K, Suzuki Y, Wakaguri H, Irie T, Tanimoto K, Hashimoto SI, Matsushima K, Mizushima-Sugano J, Yamashita R, Nakai K, Bentley D, Esumi H, Sugano S. Massive transcriptional start site analysis of human genes in hypoxia cells. Nucleic Acids Res 2009; 37:2249-63. [PMID: 19237398 PMCID: PMC2673422 DOI: 10.1093/nar/gkp066] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Combining our full-length cDNA method and the massively parallel sequencing technology, we developed a simple method to collect precise positional information of transcriptional start sites (TSSs) together with digital information of the gene-expression levels in a high throughput manner. We applied this method to observe gene-expression changes in a colon cancer cell line cultured in normoxic and hypoxic conditions. We generated more than 100 million 36-base TSS-tag sequences and revealed comprehensive features of hypoxia responsive alterations in the transcriptional landscape of the human genome. The features include presence of inducible 'hot regions' in 54 genomic regions, 220 novel hypoxia inducible promoters that may drive non-protein-coding transcripts, 191 hypoxia responsive alternative promoters and detailed views of 120 novel as well as known hypoxia responsive genes. We further analyzed hypoxic response of different cells using additional 60 million TSS-tags and found that the degree of the gene-expression changes were different among cell lines, possibly reflecting cellular robustness against hypoxia. The novel dynamic figure of the human gene transcriptome will deepen our understanding of the transcriptional program of the human genome as well as bringing new insights into the biology of cancer cells in hypoxia.
Collapse
Affiliation(s)
- Katsuya Tsuchihara
- Cancer Physiology Project, Research Center for Innovative Oncology, National Cancer Center Hospital East, Kashiwa, Chiba, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Transcriptome analysis and identification of regulators for long-term plasticity in Aplysia kurodai. Proc Natl Acad Sci U S A 2008; 105:18602-7. [PMID: 19017802 DOI: 10.1073/pnas.0808893105] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The marine mollusk Aplysia is a useful model organism for studying the cellular bases of behavior and plasticity. However, molecular studies of Aplysia have been limited by the lack of genomic information. Recently, a large scale characterization of neuronal transcripts was performed in A. californica. Here, we report the analysis of a parallel set of neuronal transcripts from a closely related species A. kurodai found in the northwestern Pacific. We collected 4,859 nonredundant sequences from the nervous system tissue of A. kurodai. By performing microarray and real-time PCR analyses, we found that ApC/EBP, matrilin, antistasin, and eIF3e clones were significantly up-regulated and a BAT1 homologous clone was significantly down-regulated by 5-HT treatment. Among these, we further demonstrated that the Ap-eIF3e plays a key role in 5-HT-induced long-term facilitation (LTF) as a positive regulator.
Collapse
|
34
|
Wakaguri H, Suzuki Y, Katayama T, Kawashima S, Kibukawa E, Hiranuka K, Sasaki M, Sugano S, Watanabe J. Full-Malaria/Parasites and Full-Arthropods: databases of full-length cDNAs of parasites and arthropods, update 2009. Nucleic Acids Res 2008; 37:D520-5. [PMID: 18987005 PMCID: PMC2686583 DOI: 10.1093/nar/gkn856] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Full-Malaria/Parasites is a database for transcriptome studies of apicomplexa and other parasites, which is based on our original full-length cDNA sequences and physical cDNA clone resources. In this update, the database has been expanded to contain the shogun sequencing for the entire sequences of 14,818 non-redundant full-length cDNA clones from six apicomplexa parasites and 6.8 million of transcription start sites (TSS), both of which had been produced by novel protocols using the oligo-capping method and the Illumina GA sequencer. The former should be the ultimate data for exact annotation of the expressed genes, while the latter should be useful for ultra-deep expression analysis. Furthermore, we have launched Full-Arthropods, a full-length cDNA database for arthropods of medical importance. Full-Arthropods contains 50 343 one-pass sequences, 10 399 shotgun complete sequences and 22.4 million TSS tags in anopheles mosquitoes that transmit malaria, tsetse flies that transmit trypanosomiasis and dust mites that cause allergic dermatitis and bronchial asthma. By providing the largest integrated full-length cDNA data resources in the apicomplexa parasites as well as their vectors, Full-Malaria/Parasites and Full-Arthropods should help combat parasitic diseases. Full-Malaria/Parasites and Full-Arthropods are accessible from http://fullmal.hgc.jp/.
Collapse
Affiliation(s)
- Hiroyuki Wakaguri
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo. 4-6-1, Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Weak correlation between sequence conservation in promoter regions and in protein-coding regions of human-mouse orthologous gene pairs. BMC Genomics 2008; 9:152. [PMID: 18384671 PMCID: PMC2335122 DOI: 10.1186/1471-2164-9-152] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2007] [Accepted: 04/02/2008] [Indexed: 12/30/2022] Open
Abstract
Background Interspecies sequence comparison is a powerful tool to extract functional or evolutionary information from the genomes of organisms. A number of studies have compared protein sequences or promoter sequences between mammals, which provided many insights into genomics. However, the correlation between protein conservation and promoter conservation remains controversial. Results We examined promoter conservation as well as protein conservation for 6,901 human and mouse orthologous genes, and observed a very weak correlation between them. We further investigated their relationship by decomposing it based on functional categories, and identified categories with significant tendencies. Remarkably, the 'ribosome' category showed significantly low promoter conservation, despite its high protein conservation, and the 'extracellular matrix' category showed significantly high promoter conservation, in spite of its low protein conservation. Conclusion Our results show the relation of gene function to protein conservation and promoter conservation, and revealed that there seem to be nonparallel components between protein and promoter sequence evolution.
Collapse
|
36
|
Osada N, Hashimoto K, Kameoka Y, Hirata M, Tanuma R, Uno Y, Inoue I, Hida M, Suzuki Y, Sugano S, Terao K, Kusuda J, Takahashi I. Large-scale analysis of Macaca fascicularis transcripts and inference of genetic divergence between M. fascicularis and M. mulatta. BMC Genomics 2008; 9:90. [PMID: 18294402 PMCID: PMC2287170 DOI: 10.1186/1471-2164-9-90] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2007] [Accepted: 02/24/2008] [Indexed: 01/26/2023] Open
Abstract
Background Cynomolgus macaques (Macaca fascicularis) are widely used as experimental animals in biomedical research and are closely related to other laboratory macaques, such as rhesus macaques (M. mulatta). We isolated 85,721 clones and determined 9407 full-insert sequences from cynomolgus monkey brain, testis, and liver. These sequences were annotated based on homology to human genes and stored in a database, QFbase . Results We found that 1024 transcripts did not represent any public human cDNA sequence and examined their expression using M. fascicularis oligonucleotide microarrays. Significant expression was detected for 544 (51%) of the unidentified transcripts. Moreover, we identified 226 genes containing exon alterations in the untranslated regions of the macaque transcripts, despite the highly conserved structure of the coding regions. Considering the polymorphism in the common ancestor of cynomolgus and rhesus macaques and the rate of PCR errors, the divergence time between the two species was estimated to be around 0.9 million years ago. Conclusion Transcript data from Old World monkeys provide a means not only to determine the evolutionary difference between human and non-human primates but also to unveil hidden transcripts in the human genome. Increasing the genomic resources and information of macaque monkeys will greatly contribute to the development of evolutionary biology and biomedical sciences.
Collapse
Affiliation(s)
- Naoki Osada
- Department of Biomedical Resources, National Institute of Biomedical Innovation, Ibaraki, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Wakaguri H, Yamashita R, Suzuki Y, Sugano S, Nakai K. DBTSS: database of transcription start sites, progress report 2008. Nucleic Acids Res 2007; 36:D97-101. [PMID: 17942421 PMCID: PMC2238895 DOI: 10.1093/nar/gkm901] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
DBTSS is a database of transcriptional start sites, based on our unique collection of precise, experimentally determined 5'-end sequences of full-length cDNAs. Since its first release in 2002, several major updates have been made. In this update, we expanded the human transcriptional start site dataset by 19 million uniquely mapped, and RefSeq-associated, 5'-end sequences, which were generated by a newly introduced Solexa sequencer. Moreover, in order to provide means for interpreting those massive TSS data, we implemented two new analytical tools: one for connecting expression information with predicted transcription factor binding sites; the other for examining evolutionary conservation or species-specificity of promoters and transcripts, which can be browsed by our own comparative genome viewer. With the expanded dataset and the enhanced functionalities, DBTSS provides a unique platform that enables in-depth transcriptome analyses. DBTSS is accessible at http://dbtss.hgc.jp/.
Collapse
Affiliation(s)
- Hiroyuki Wakaguri
- Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | | | | | | | | |
Collapse
|
38
|
Tsuritani K, Irie T, Yamashita R, Sakakibara Y, Wakaguri H, Kanai A, Mizushima-Sugano J, Sugano S, Nakai K, Suzuki Y. Distinct class of putative "non-conserved" promoters in humans: comparative studies of alternative promoters of human and mouse genes. Genome Res 2007; 17:1005-14. [PMID: 17567985 PMCID: PMC1899111 DOI: 10.1101/gr.6030107] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Although recent studies have revealed that the majority of human genes are subject to regulation of alternative promoters, the biological relevance of this phenomenon remains unclear. We have also demonstrated that roughly half of the human RefSeq genes examined contain putative alternative promoters (PAPs). Here we report large-scale comparative studies of PAPs between human and mouse counterpart genes. Detailed sequence comparison of the 17,245 putative promoter regions (PPRs) in 5463 PAP-containing human genes revealed that PPRs in only a minor fraction of genes (807 genes) showed clear evolutionary conservation as one or more pairs. Also, we found that there were substantial qualitative differences between conserved and non-conserved PPRs, with the latter class being AT-rich PPRs of relative minor usage, enriched in repetitive elements and sometimes producing transcripts that encode small or no proteins. Systematic luciferase assays of these PPRs revealed that both classes of PPRs did have promoter activity, but that their strength ranges were significantly different. Furthermore, we demonstrate that these characteristic features of the non-conserved PPRs are shared with the PPRs of previously discovered putative non-protein coding transcripts. Taken together, our data suggest that there are two distinct classes of promoters in humans, with the latter class of promoters emerging frequently during evolution.
Collapse
Affiliation(s)
- Katsuki Tsuritani
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minatoku, Tokyo 108-8639, Japan
| | - Takuma Irie
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8562, Japan
| | - Riu Yamashita
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minatoku, Tokyo 108-8639, Japan
| | - Yuta Sakakibara
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8562, Japan
| | - Hiroyuki Wakaguri
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8562, Japan
| | - Akinori Kanai
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8562, Japan
| | - Junko Mizushima-Sugano
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8562, Japan
- Laboratory of Viral Infection II Kitasato Institute for Life Sciences, Kitasato University, Tokyo 108-8641, Japan
| | - Sumio Sugano
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8562, Japan
| | - Kenta Nakai
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minatoku, Tokyo 108-8639, Japan
| | - Yutaka Suzuki
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8562, Japan
- Corresponding author.E-mail ; fax +81-4-7136-3607
| |
Collapse
|
39
|
Sakakibara Y, Irie T, Suzuki Y, Yamashita R, Wakaguri H, Kanai A, Chiba J, Takagi T, Mizushima-Sugano J, Hashimoto SI, Nakai K, Sugano S. Intrinsic promoter activities of primary DNA sequences in the human genome. DNA Res 2007; 14:71-7. [PMID: 17522093 PMCID: PMC2779894 DOI: 10.1093/dnares/dsm006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In order to understand an overview of promoter activities intrinsic to primary DNA sequences in the human genome within a particular cell type, we carried out systematic quantitative luciferase assays of DNA fragments corresponding to putative promoters for 472 human genes which are expressed in HEK (human embryonic kidney epithelial) 293 cells. We observed the promoter activities of them were distributed in a bimodal manner; putative promoters belonging to the first group (with strong promoter activities) were designated as P1 and the latter (with weak promoter activities) as P2. The frequencies of the TATA-boxes, the CpG islands, and the overall G + C-contents were significantly different between these two populations, indicating there are two separate groups of promoters. Interestingly, similar analysis using 251 randomly isolated genomic DNA fragments showed that P2-type promoter occasionally occurs within the human genome. Furthermore, 35 DNA fragments corresponding to putative promoters of non-protein-coding transcripts (ncRNAs) shared similar features with the P2 in both promoter activities and sequence compositions. At least, a part of ncRNAs, which have been massively identified by full-length cDNA projects with no functional relevance inferred, may have originated from those sporadic promoter activities of primary DNA sequences inherent to the human genome.
Collapse
Affiliation(s)
- Yuta Sakakibara
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
- Faculty of Industrial Science and Technology, Tokyo University of Science, 2641 Yamazaki, Noda-shi, Chiba 278-8510, Japan
| | - Takuma Irie
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| | - Yutaka Suzuki
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
- To whom correspondence should be addressed. Tel/Fax. +81 4-7136-3607. E-mail:
| | - Riu Yamashita
- Human Genome Center, The Institute of Medical Science, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| | - Hiroyuki Wakaguri
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| | - Akinori Kanai
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| | - Joe Chiba
- Faculty of Industrial Science and Technology, Tokyo University of Science, 2641 Yamazaki, Noda-shi, Chiba 278-8510, Japan
| | - Toshihisa Takagi
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| | - Junko Mizushima-Sugano
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
- Laboratory of Viral Infection II, Kitasato Institute for Life Sciences, Kitasato University, 5-9-1 Sirokane Minato-ku, Tokyo 108-8641, Japan
| | - Shin-ichi Hashimoto
- School of Medicine, the University of Tokyo, 7-3-1 Hongo, Bunkyoku, Tokyo 113-0033, Japan
| | - Kenta Nakai
- Human Genome Center, The Institute of Medical Science, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| | - Sumio Sugano
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| |
Collapse
|
40
|
Watanabe J, Wakaguri H, Sasaki M, Suzuki Y, Sugano S. Comparasite: a database for comparative study of transcriptomes of parasites defined by full-length cDNAs. Nucleic Acids Res 2006; 35:D431-8. [PMID: 17151081 PMCID: PMC1781114 DOI: 10.1093/nar/gkl1039] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Comparasite is a database for comparative studies of transcriptomes of parasites. In this database, each data is defined by the full-length cDNAs from various apicomplexan parasites. It integrates seven individual databases, Full-Parasites, consisting of numerous full-length cDNA clones that we have produced and sequenced: 12 484 cDNA sequences from Plasmodium falciparum, 11 262 from Plasmodium yoelii, 9633 from Plasmodium vivax, 1518 from Plasmodium berghei, 7400 from Toxoplasma gondii, 5921 from Cryptosporidium parvum and 10 966 from the tapeworm Echinococcus multilocularis. Putatively counterpart gene groups are clustered and comparative analysis of any combination of six apicomplexa species is implemented, such as interspecies comparisons regarding protein motifs (InterPro), predicted subcellular localization signals (PSORT), transmembrane regions (SOSUI) or upstream promoter elements. By specifying keywords and other search conditions, Comparasite retrieves putative counterpart gene groups containing a given feature in common or in a species-specific manner. By enabling multi-faceted comparative analyses of genes of apicomplexa protozoa, monophyletic organisms that have evolved to diversify to parasitize various hosts by adopting complex life cycles, Comparasite should help elucidate the mechanism behind parasitism. Our full-length cDNA databases and Comparasite are accessible from .
Collapse
Affiliation(s)
- Junichi Watanabe
- Department of Parasitology, Institute of Medical Science, Graduate School of Frontier Sciences, University of Tokyo, 4-6-1, Shirokanedai, Minatoku, Tokyo 108-8639, Japan.
| | | | | | | | | |
Collapse
|
41
|
Forrest ARR, Taylor DF, Crowe ML, Chalk AM, Waddell NJ, Kolle G, Faulkner GJ, Kodzius R, Katayama S, Wells C, Kai C, Kawai J, Carninci P, Hayashizaki Y, Grimmond SM. Genome-wide review of transcriptional complexity in mouse protein kinases and phosphatases. Genome Biol 2006; 7:R5. [PMID: 16507138 PMCID: PMC1431701 DOI: 10.1186/gb-2006-7-1-r5] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2005] [Revised: 11/02/2005] [Accepted: 12/16/2005] [Indexed: 11/25/2022] Open
Abstract
A systematic study of the transcript variants of all protein kinase- and phosphatase-like loci in mouse shows that at least 75% of them generate alternative transcripts, many of which encode different domain structures. Background Alternative transcripts of protein kinases and protein phosphatases are known to encode peptides with altered substrate affinities, subcellular localizations, and activities. We undertook a systematic study to catalog the variant transcripts of every protein kinase-like and phosphatase-like locus of mouse . Results By reviewing all available transcript evidence, we found that at least 75% of kinase and phosphatase loci in mouse generate alternative splice forms, and that 44% of these loci have well supported alternative 5' exons. In a further analysis of full-length cDNAs, we identified 69% of loci as generating more than one peptide isoform. The 1,469 peptide isoforms generated from these loci correspond to 1,080 unique Interpro domain combinations, many of which lack catalytic or interaction domains. We also report on the existence of likely dominant negative forms for many of the receptor kinases and phosphatases, including some 26 secreted decoys (seven known and 19 novel: Alk, Csf1r, Egfr, Epha1, 3, 5,7 and 10, Ephb1, Flt1, Flt3, Insr, Insrr, Kdr, Met, Ptk7, Ptprc, Ptprd, Ptprg, Ptprl, Ptprn, Ptprn2, Ptpro, Ptprr, Ptprs, and Ptprz1) and 13 transmembrane forms (four known and nine novel: Axl, Bmpr1a, Csf1r, Epha4, 5, 6 and 7, Ntrk2, Ntrk3, Pdgfra, Ptprk, Ptprm, Ptpru). Finally, by mining public gene expression data (MPSS and microarrays), we confirmed tissue-specific expression of ten of the novel isoforms. Conclusion These findings suggest that alternative transcripts of protein kinases and phosphatases are produced that encode different domain structures, and that these variants are likely to play important roles in phosphorylation-dependent signaling pathways.
Collapse
Affiliation(s)
- Alistair RR Forrest
- Institute for Molecular Bioscience and ARC Centre in Bioinformatics, University of Queensland, Brisbane, QLD 4072, Australia
| | - Darrin F Taylor
- Institute for Molecular Bioscience and ARC Centre in Bioinformatics, University of Queensland, Brisbane, QLD 4072, Australia
| | - Mark L Crowe
- Institute for Molecular Bioscience and ARC Centre in Bioinformatics, University of Queensland, Brisbane, QLD 4072, Australia
| | - Alistair M Chalk
- Institute for Molecular Bioscience and ARC Centre in Bioinformatics, University of Queensland, Brisbane, QLD 4072, Australia
- Queensland Institute for Medical Research, PO Royal Brisbane Hospital, Brisbane, QLD 4029, Australia
- Center for Genomics and Bioinformatics, Karolinska Institutet, S-171 77 Stockholm, Sweden
| | - Nic J Waddell
- Institute for Molecular Bioscience and ARC Centre in Bioinformatics, University of Queensland, Brisbane, QLD 4072, Australia
- Queensland Institute for Medical Research, PO Royal Brisbane Hospital, Brisbane, QLD 4029, Australia
| | - Gabriel Kolle
- Institute for Molecular Bioscience and ARC Centre in Bioinformatics, University of Queensland, Brisbane, QLD 4072, Australia
| | - Geoffrey J Faulkner
- Institute for Molecular Bioscience and ARC Centre in Bioinformatics, University of Queensland, Brisbane, QLD 4072, Australia
- Queensland Institute for Medical Research, PO Royal Brisbane Hospital, Brisbane, QLD 4029, Australia
| | - Rimantas Kodzius
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, Wako, Saitama, 351-0198, Japan
| | - Shintaro Katayama
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Yokohama, Kanagawa, 230-0045, Japan
| | - Christine Wells
- Institute for Molecular Bioscience and ARC Centre in Bioinformatics, University of Queensland, Brisbane, QLD 4072, Australia
- The Eskitis Institute for Cell and Molecular Therapies, Griffith University, QLD 4111, Australia
| | - Chikatoshi Kai
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Yokohama, Kanagawa, 230-0045, Japan
| | - Jun Kawai
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, Wako, Saitama, 351-0198, Japan
| | - Piero Carninci
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, Wako, Saitama, 351-0198, Japan
| | - Yoshihide Hayashizaki
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, Wako, Saitama, 351-0198, Japan
| | - Sean M Grimmond
- Institute for Molecular Bioscience and ARC Centre in Bioinformatics, University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
42
|
Kimura K, Wakamatsu A, Suzuki Y, Ota T, Nishikawa T, Yamashita R, Yamamoto JI, Sekine M, Tsuritani K, Wakaguri H, Ishii S, Sugiyama T, Saito K, Isono Y, Irie R, Kushida N, Yoneyama T, Otsuka R, Kanda K, Yokoi T, Kondo H, Wagatsuma M, Murakawa K, Ishida S, Ishibashi T, Takahashi-Fujii A, Tanase T, Nagai K, Kikuchi H, Nakai K, Isogai T, Sugano S. Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes. Genes Dev 2006; 16:55-65. [PMID: 16344560 PMCID: PMC1356129 DOI: 10.1101/gr.4039406] [Citation(s) in RCA: 371] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2005] [Accepted: 09/19/2005] [Indexed: 12/21/2022]
Abstract
By analyzing 1,780,295 5'-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by more than 500 bp and thus are very likely to constitute mutually distinct alternative promoters. To our surprise, at least 7674 (52%) human RefSeq genes were subject to regulation by putative alternative promoters (PAPs). On average, there were 3.1 PAPs per gene, with the composition of one CpG-island-containing promoter per 2.6 CpG-less promoters. In 17% of the PAP-containing loci, tissue-specific use of the PAPs was observed. The richest tissue sources of the tissue-specific PAPs were testis and brain. It was also intriguing that the PAP-containing promoters were enriched in the genes encoding signal transduction-related proteins and were rarer in the genes encoding extracellular proteins, possibly reflecting the varied functional requirement for and the restricted expression of those categories of genes, respectively. The patterns of the first exons were highly diverse as well. On average, there were 7.7 different splicing types of first exons per locus partly produced by the PAPs, suggesting that a wide variety of transcripts can be achieved by this mechanism. Our findings suggest that use of alternate promoters and consequent alternative use of first exons should play a pivotal role in generating the complexity required for the highly elaborated molecular systems in humans.
Collapse
Affiliation(s)
- Kouichi Kimura
- Life Science Research Laboratory, Central Research Laboratory, Hitachi, Ltd., Kokubunji, Tokyo, 185-8601, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Lee DH, Gershenzon N, Gupta M, Ioshikhes IP, Reinberg D, Lewis BA. Functional characterization of core promoter elements: the downstream core element is recognized by TAF1. Mol Cell Biol 2005; 25:9674-86. [PMID: 16227614 PMCID: PMC1265815 DOI: 10.1128/mcb.25.21.9674-9686.2005] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Downstream elements are a newly appreciated class of core promoter elements of RNA polymerase II-transcribed genes. The downstream core element (DCE) was discovered in the human beta-globin promoter, and its sequence composition is distinct from that of the downstream promoter element (DPE). We show here that the DCE is a bona fide core promoter element present in a large number of promoters and with high incidence in promoters containing a TATA motif. Database analysis indicates that the DCE is found in diverse promoters, supporting its functional relevance in a variety of promoter contexts. The DCE consists of three subelements, and DCE function is recapitulated in a TFIID-dependent manner. Subelement 3 can function independently of the other two and shows a TFIID requirement as well. UV photo-cross-linking results demonstrate that TAF1/TAF(II)250 interacts with the DCE subelement DNA in a sequence-dependent manner. These data show that downstream elements consist of at least two types, those of the DPE class and those of the DCE class; they function via different DNA sequences and interact with different transcription activation factors. Finally, these data argue that TFIID is, in fact, a core promoter recognition complex.
Collapse
Affiliation(s)
- Dong-Hoon Lee
- Department of Biochemistry, Robert Woods Johnson Medical School, 683 Hoes Lane, Piscataway, NJ 08854, USA
| | | | | | | | | | | |
Collapse
|
44
|
Bonaldo MF, Bair TB, Scheetz TE, Snir E, Akabogu I, Bair JL, Berger B, Crouch K, Davis A, Eyestone ME, Keppel C, Kucaba TA, Lebeck M, Lin JL, de Melo AIR, Rehmann J, Reiter RS, Schaefer K, Smith C, Tack D, Trout K, Sheffield VC, Lin JJC, Casavant TL, Soares MB. 1274 full-open reading frames of transcripts expressed in the developing mouse nervous system. Genome Res 2004; 14:2053-63. [PMID: 15489326 PMCID: PMC528920 DOI: 10.1101/gr.2601304] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
As part of the trans-National Institutes of Health (NIH) Mouse Brain Molecular Anatomy Project (BMAP), and in close coordination with the NIH Mammalian Gene Collection Program (MGC), we initiated a large-scale project to clone, identify, and sequence the complete open reading frame (ORF) of transcripts expressed in the developing mouse nervous system. Here we report the analysis of the ORF sequence of 1274 cDNAs, obtained from 47 full-length-enriched cDNA libraries, constructed by using a novel approach, herein described. cDNA libraries were derived from size-fractionated cytoplasmic mRNA isolated from brain and eye tissues obtained at several embryonic stages and postnatal days. Altogether, including the full-ORF MGC sequences derived from these libraries by the MGC sequencing team, NIH_BMAP full-ORF sequences correspond to approximately 20% of all transcripts currently represented in mouse MGC. We show that NIH_BMAP clones comprise 68% of mouse MGC cDNAs > or =5 kb, and 54% of those > or =4 kb, as of March 15, 2004. Importantly, we identified transcripts, among the 1274 full-ORF sequences, that are exclusively or predominantly expressed in brain and eye tissues, many of which encode yet uncharacterized proteins.
Collapse
Affiliation(s)
- Maria F Bonaldo
- Department of Pediatrics, The University of Iowa, Iowa City, Iowa 52242, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Suzuki Y, Yamashita R, Shirota M, Sakakibara Y, Chiba J, Mizushima-Sugano J, Nakai K, Sugano S. Sequence comparison of human and mouse genes reveals a homologous block structure in the promoter regions. Genome Res 2004; 14:1711-8. [PMID: 15342556 PMCID: PMC515316 DOI: 10.1101/gr.2435604] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Comparative sequence analysis was carried out for the regions adjacent to experimentally validated transcriptional start sites (TSSs), using 3324 pairs of human and mouse genes. We aligned the upstream putative promoter sequences over the 1-kb proximal regions and found that the sequence conservation could not be further extended at, on average, 510 bp upstream positions of the TSSs. This discontinuous manner of the sequence conservation revealed a "block" structure in about one-third of the putative promoter regions. Consistently, we also observed that G+C content and CpG frequency were significantly different inside and outside the blocks. Within the blocks, the sequence identity was uniformly 65% regardless of their length. About 90% of the previously characterized transcription factor binding sites were located within those blocks. In 46% of the blocks, the 5' ends were bounded by interspersed repetitive elements, some of which may have nucleated the genomic rearrangements. The length of the blocks was shortest in the promoters of genes encoding transcription factors and of genes whose expression patterns are brain specific, which suggests that the evolutional diversifications in the transcriptional modulations should be the most marked in these populations of genes.
Collapse
Affiliation(s)
- Yutaka Suzuki
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, 108-8639, Japan.
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Huang DY, Kuo YY, Lai JS, Suzuki Y, Sugano S, Chang ZF. GATA-1 and NF-Y cooperate to mediate erythroid-specific transcription of Gfi-1B gene. Nucleic Acids Res 2004; 32:3935-46. [PMID: 15280509 PMCID: PMC506805 DOI: 10.1093/nar/gkh719] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Expression of Gfi (growth factor-independence)-1B, a Gfi-1-related transcriptional repressor, is restricted to erythroid lineage cells and is essential for erythropoiesis. We have determined the transcription start site of the human Gfi-1B gene and located its first non-coding exon approximately 7.82 kb upstream of the first coding exon. The genomic sequence preceding this first non-coding exon has been identified to be its erythroid-specific promoter region in K562 cells. Using gel-shift and chromatin immunoprecipitation (ChIP) assays, we have demonstrated that NF-Y and GATA-1 directly participate in transcriptional activation of the Gfi-1B gene in K562 cells. Ectopic expression of GATA-1 markedly stimulates the activity of the Gfi-1B promoter in a non-erythroid cell line U937. Interestingly, our results have indicated that this GATA-1-mediated trans-activation is dependent on NF-Y binding to the CCAAT site. Here we conclude that functional cooperation between GATA-1 and NF-Y contributes to erythroid-specific transcriptional activation of Gfi-1B promoter.
Collapse
Affiliation(s)
- Duen-Yi Huang
- Graduate Institute of Biochemistry and Molecular Biology, College of Medicine, National Taiwan University, No. 1 Jen Ai Road 1st Section, Taipei, Taiwan, Republic of China
| | | | | | | | | | | |
Collapse
|
47
|
Watanabe J, Suzuki Y, Sasaki M, Sugano S. Full-malaria 2004: an enlarged database for comparative studies of full-length cDNAs of malaria parasites, Plasmodium species. Nucleic Acids Res 2004; 32:D334-8. [PMID: 14681428 PMCID: PMC308849 DOI: 10.1093/nar/gkh115] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Full-malaria (http://fullmal.ims.u-tokyo.ac.jp), a database for full-length cDNAs from the human malaria parasite, Plasmodium falciparum has been updated in at least three points. (i) We added 8934 sequences generated from the addition of new libraries, so that our collection of 11,424 full-length cDNAs covers 1375 (25%) of the estimated number of the entire 5409 parasite genes. (ii) All of our full-length cDNAs and GenBank EST sequences were mapped to genomic sequences together with publicly available annotated genes and other predictions. This precisely determined the gene structures and positions of the transcriptional start sites, which are indispensable for the identification of the promoter regions. (iii) A total of 4257 cDNA sequences were newly generated from murine malaria parasites, Plasmodium yoelii yoelii. The genome/cDNA sequences were compared at both nucleotide and amino acid levels, with those of P.falciparum, and the sequence alignment for each gene is presented graphically. This part of the database serves as a versatile platform to elucidate the function(s) of malaria genes by a comparative genomic approach. It should also be noted that all of the cDNAs represented in this database are supported by physical cDNA clones, which are publicly and freely available, and should serve as indispensable resources to explore functional analyses of malaria genomes.
Collapse
Affiliation(s)
- Junichi Watanabe
- Department of Parasitology, Human Genome Center, Institute of Medical Science, The University of Tokyo, 4-6-1, Shirokanedai, Minatoku, Tokyo 108-8639, Japan.
| | | | | | | |
Collapse
|
48
|
Suzuki Y, Yamashita R, Sugano S, Nakai K. DBTSS, DataBase of Transcriptional Start Sites: progress report 2004. Nucleic Acids Res 2004; 32:D78-81. [PMID: 14681363 PMCID: PMC308810 DOI: 10.1093/nar/gkh076] [Citation(s) in RCA: 117] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
DBTSS (http://dbtss.hgc.jp) was originally constructed based on a collection of experimentally determined TSSs of human genes. Since its first release in 2002, it has been updated several times. First, the amount of stored data has increased significantly: e.g. the number of clones that match both the RefSeq mRNA set and the genome sequence has increased from 111,382 to 190,964, now covering 1,234 genes. Second, the positions of SNPs in dbSNP were displayed on the upstream regions of contained human genes. Third, DBTSS now covers other species such as mouse and the human malaria parasite. It will become a central database containing data for many more species with oligo-capping and related methods. Lastly, the database now serves for comparative promoter analyses: in the current version, comparative views of potentially orthologous promoters from human and mouse are presented with an additional function of searching potential transcription-factor binding sites, which are either conserved or diverged between species.
Collapse
Affiliation(s)
- Yutaka Suzuki
- Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan.
| | | | | | | |
Collapse
|