Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tatarinova T, Brover V, Troukhan M, Alexandrov N. Skew in CG content near the transcription start site in Arabidopsis thaliana. Bioinformatics 2004;19 Suppl 1:i313-4. [PMID: 12855475 DOI: 10.1093/bioinformatics/btg1043] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

For:	Tatarinova T, Brover V, Troukhan M, Alexandrov N. Skew in CG content near the transcription start site in Arabidopsis thaliana. Bioinformatics 2004;19 Suppl 1:i313-4. [PMID: 12855475 DOI: 10.1093/bioinformatics/btg1043] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

Jiao Y, Nigam D, Barry K, Daum C, Yoshinaga Y, Lipzen A, Khan A, Parasa SP, Wei S, Lu Z, Tello-Ruiz MK, Dhiman P, Burow G, Hayes C, Chen J, Brandizzi F, Mortimer J, Ware D, Xin Z. A large sequenced mutant library - valuable reverse genetic resource that covers 98% of sorghum genes. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024;117:1543-1557. [PMID: 38100514 DOI: 10.1111/tpj.16582] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 09/08/2023] [Accepted: 11/29/2023] [Indexed: 12/17/2023]

Affiliation(s)

Yinping Jiao Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance, Texas Tech University, Lubbock, Texas, 79409, USA
Deepti Nigam Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance, Texas Tech University, Lubbock, Texas, 79409, USA
Kerrie Barry DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
Chris Daum DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
Yuko Yoshinaga DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
Anna Lipzen DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
Adil Khan Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance, Texas Tech University, Lubbock, Texas, 79409, USA
Sai-Praneeth Parasa Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance, Texas Tech University, Lubbock, Texas, 79409, USA
Sharon Wei Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA
Zhenyuan Lu Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA
Marcela K Tello-Ruiz Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA
Pallavi Dhiman Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance, Texas Tech University, Lubbock, Texas, 79409, USA
Gloria Burow Plant Stress and Germplasm Development Unit, Crop Systems Research Laboratory, USDA-ARS, 3810, 4th Street, Lubbock, Texas, 79424, USA
Chad Hayes Plant Stress and Germplasm Development Unit, Crop Systems Research Laboratory, USDA-ARS, 3810, 4th Street, Lubbock, Texas, 79424, USA
Junping Chen Plant Stress and Germplasm Development Unit, Crop Systems Research Laboratory, USDA-ARS, 3810, 4th Street, Lubbock, Texas, 79424, USA
Federica Brandizzi MSU-DOE Plant Research Lab, Michigan State University, East Lansing, Michigan, USA Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, Michigan, USA Department of Plant Biology, Michigan State University, East Lansing, Michigan, USA
Jenny Mortimer Joint BioEnergy Institute, Emeryville, California, 94608, USA Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, California, 94720, USA School of Agriculture, Food and Wine, Waite Research Institute, Waite Research Precinct, University of Adelaide, Glen Osmond, South Australia, 5064, Australia
Doreen Ware Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA USDA-ARS NAA Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, New York, 14853, USA
Zhanguo Xin Plant Stress and Germplasm Development Unit, Crop Systems Research Laboratory, USDA-ARS, 3810, 4th Street, Lubbock, Texas, 79424, USA

Collapse

Genome-Wide Prediction of Transcription Start Sites in Conifers. Int J Mol Sci 2022;23:ijms23031735. [PMID: 35163661 PMCID: PMC8836283 DOI: 10.3390/ijms23031735] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/30/2022] [Accepted: 02/01/2022] [Indexed: 02/04/2023] Open

Abstract

The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms.

Collapse

Ryasik A, Orlov M, Zykova E, Ermak T, Sorokin A. Bacterial promoter prediction: Selection of dynamic and static physical properties of DNA for reliable sequence classification. J Bioinform Comput Biol 2018;16:1840003. [DOI: 10.1142/s0219720018400036] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Triska M, Solovyev V, Baranova A, Kel A, Tatarinova TV. Nucleotide patterns aiding in prediction of eukaryotic promoters. PLoS One 2017;12:e0187243. [PMID: 29141011 PMCID: PMC5687710 DOI: 10.1371/journal.pone.0187243] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Accepted: 09/05/2017] [Indexed: 01/09/2023] Open

Evolution of Brain Active Gene Promoters in Human Lineage Towards the Increased Plasticity of Gene Regulation. Mol Neurobiol 2017;55:1871-1904. [PMID: 28233272 DOI: 10.1007/s12035-017-0427-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 01/26/2017] [Indexed: 01/31/2023]

Chan KL, Rosli R, Tatarinova TV, Hogan M, Firdaus-Raih M, Low ETL. Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data. BMC Bioinformatics 2017;18:1426. [PMID: 28466793 PMCID: PMC5333190 DOI: 10.1186/s12859-016-1426-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Gene prediction is one of the most important steps in the genome annotation process. A large number of software tools and pipelines developed by various computing techniques are available for gene prediction. However, these systems have yet to accurately predict all or even most of the protein-coding regions. Furthermore, none of the currently available gene-finders has a universal Hidden Markov Model (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion.

RESULTS

We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure).

CONCLUSIONS

Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.

Collapse

Tatarinova TV, Chekalin E, Nikolsky Y, Bruskin S, Chebotarov D, McNally KL, Alexandrov N. Nucleotide diversity analysis highlights functionally important genomic regions. Sci Rep 2016;6:35730. [PMID: 27774999 PMCID: PMC5075931 DOI: 10.1038/srep35730] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/30/2016] [Indexed: 12/15/2022] Open

Lis M, Walther D. The orientation of transcription factor binding site motifs in gene promoter regions: does it matter? BMC Genomics 2016;17:185. [PMID: 26939991 PMCID: PMC4778318 DOI: 10.1186/s12864-016-2549-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 02/27/2016] [Indexed: 12/23/2022] Open

Abstract

Background

Gene expression is to large degree regulated by the specific binding of protein transcription factors to cis-regulatory transcription factor binding sites in gene promoter regions. Despite the identification of hundreds of binding site sequence motifs, the question as to whether motif orientation matters with regard to the gene expression regulation of the respective downstream genes appears surprisingly underinvestigated.

Results

We pursued a statistical approach by probing 293 reported non-palindromic transcription factor binding site and ten core promoter motifs in Arabidopsis thaliana for evidence of any relevance of motif orientation based on mapping statistics and effects on the co-regulation of gene expression of the respective downstream genes. Although positional intervals closer to the transcription start site (TSS) were found with increased frequencies of motifs exhibiting orientation preference, a corresponding effect with regard to gene expression regulation as evidenced by increased co-expression of genes harboring the favored orientation in their upstream sequence could not be established. Furthermore, we identified an intrinsic orientational asymmetry of sequence regions close to the TSS as the likely source of the identified motif orientation preferences. By contrast, motif presence irrespective of orientation was found associated with pronounced effects on gene expression co-regulation validating the pursued approach. Inspecting motif pairs revealed statistically preferred orientational arrangements, but no consistent effect with regard to arrangement-dependent gene expression regulation was evident.

Conclusions

Our results suggest that for the motifs considered here, either no specific orientation rendering them functional across all their instances exists with orientational requirements instead depending on gene-locus specific additional factors, or that the binding orientation of transcription factors may generally not be relevant, but rather the event of binding itself.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-2549-x) contains supplementary material, which is available to authorized users.

Collapse

Bourras S, Rouxel T, Meyer M. Agrobacterium tumefaciens Gene Transfer: How a Plant Pathogen Hacks the Nuclei of Plant and Nonplant Organisms. PHYTOPATHOLOGY 2015;105:1288-1301. [PMID: 26151736 DOI: 10.1094/phyto-12-14-0380-rvw] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Tatarinova T, Elhaik E, Pellegrini M. Cross-species analysis of genic GC3 content and DNA methylation patterns. Genome Biol Evol 2013;5:1443-56. [PMID: 23833164 PMCID: PMC3762193 DOI: 10.1093/gbe/evt103] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Abstract

The GC content in the third codon position (GC₃) exhibits a unimodal distribution in many plant and animal genomes. Interestingly, grasses and homeotherm vertebrates exhibit a unique bimodal distribution. High GC₃ was previously found to be associated with variable expression, higher frequency of upstream TATA boxes, and an increase of GC₃ from 5′ to 3′. Moreover, GC₃-rich genes are predominant in certain gene classes and are enriched in CpG dinucleotides that are potential targets for methylation. Based on the GC₃ bimodal distribution we hypothesize that GC₃ has a regulatory role involving methylation and gene expression. To test that hypothesis, we selected diverse taxa (rice, thale cress, bee, and human) that varied in the modality of their GC₃ distribution and tested the association between GC₃, DNA methylation, and gene expression. We examine the relationship between cytosine methylation levels and GC₃, gene expression, genome signature, gene length, and other gene compositional features. We find a strong negative correlation (Pearson’s correlation coefficient r = −0.67, P value < 0.0001) between GC₃ and genic CpG methylation. The comparison between 5′-3′ gradients of CG₃-skew and genic methylation for the taxa in the study suggests interplay between gene-body methylation and transcription-coupled cytosine deamination effect. Compositional features are correlated with methylation levels of genes in rice, thale cress, human, bee, and fruit fly (which acts as an unmethylated control). These patterns allow us to generate evolutionary hypotheses about the relationships between GC₃ and methylation and how these affect expression patterns. Specifically, we propose that the opposite effects of methylation and compositional gradients along coding regions of GC₃-poor and GC₃-rich genes are the products of several competing processes.

Collapse

Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters. BMC Genomics 2012;13:512. [PMID: 23020586 PMCID: PMC3549790 DOI: 10.1186/1471-2164-13-512] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2012] [Accepted: 09/13/2012] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The main function of gene promoters appears to be the integration of different gene products in their biological pathways in order to maintain homeostasis. Generally, promoters have been classified in two major classes, namely TATA and CpG. Nevertheless, many genes using the same combinatorial formation of transcription factors have different gene expression patterns. Accordingly, we tried to ask ourselves some fundamental questions: Why certain genes have an overall predisposition for higher gene expression levels than others? What causes such a predisposition? Is there a structural relationship of these sequences in different tissues? Is there a strong phylogenetic relationship between promoters of closely related species?

RESULTS

In order to gain valuable insights into different promoter regions, we obtained a series of image-based patterns which allowed us to identify 10 generic classes of promoters. A comprehensive analysis was undertaken for promoter sequences from Arabidopsis thaliana, Drosophila melanogaster, Homo sapiens and Oryza sativa, and a more extensive analysis of tissue-specific promoters in humans. We observed a clear preference for these species to use certain classes of promoters for specific biological processes. Moreover, in humans, we found that different tissues use distinct classes of promoters, reflecting an emerging promoter network. Depending on the tissue type, comparisons made between these classes of promoters reveal a complementarity between their patterns whereas some other classes of promoters have been observed to occur in competition. Furthermore, we also noticed the existence of some transitional states between these classes of promoters that may explain certain evolutionary mechanisms, which suggest a possible predisposition for specific levels of gene expression and perhaps for a different number of factors responsible for triggering gene expression. Our conclusions are based on comprehensive data from three different databases and a new computer model whose core is using Kappa index of coincidence.

CONCLUSIONS

To fully understand the connections between gene promoters and gene expression, we analyzed thousands of promoter sequences using our Kappa Index of Coincidence method and a specialized Optical Character Recognition (OCR) neural network. Under our criteria, 10 classes of promoters were detected. In addition, the existence of "transitional" promoters suggests that there is an evolutionary weighted continuum between classes, depending perhaps upon changes in their gene products.

Collapse

Arakawa K, Tomita M. Measures of compositional strand bias related to replication machinery and its applications. Curr Genomics 2012;13:4-15. [PMID: 22942671 PMCID: PMC3269016 DOI: 10.2174/138920212799034749] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2011] [Revised: 09/10/2011] [Accepted: 09/20/2011] [Indexed: 11/22/2022] Open

Incidence of genome structure, DNA asymmetry, and cell physiology on T-DNA integration in chromosomes of the phytopathogenic fungus Leptosphaeria maculans. G3-GENES GENOMES GENETICS 2012;2:891-904. [PMID: 22908038 PMCID: PMC3411245 DOI: 10.1534/g3.112.002048] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2012] [Accepted: 06/07/2012] [Indexed: 11/18/2022]

McLean MA, Tirosh I. Opposite GC skews at the 5' and 3' ends of genes in unicellular fungi. BMC Genomics 2011;12:638. [PMID: 22208287 PMCID: PMC3315797 DOI: 10.1186/1471-2164-12-638] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Accepted: 12/30/2011] [Indexed: 11/24/2022] Open

Synonymous Codon Usage, GC3, and Evolutionary Patterns Across Plastomes of Three Pooid Model Species: Emerging Grass Genome Models for Monocots. Mol Biotechnol 2011;49:116-28. [DOI: 10.1007/s12033-011-9383-9] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Medvedeva YA, Kulakovskii IV, Oparina NY, Favorov AV, Makeev VY. The GC skew near Pol II start sites and its association with SP1-binding site variants. Biophysics (Nagoya-shi) 2010. [DOI: 10.1134/s0006350910060023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Identification of TATA and TATA-less promoters in plant genomes by integrating diversity measure, GC-Skew and DNA geometric flexibility. Genomics 2010;97:112-20. [PMID: 21112384 DOI: 10.1016/j.ygeno.2010.11.002] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Revised: 11/05/2010] [Accepted: 11/12/2010] [Indexed: 11/20/2022]

Palidwor GA, Perkins TJ, Xia X. A general model of codon bias due to GC mutational bias. PLoS One 2010;5:e13431. [PMID: 21048949 PMCID: PMC2965080 DOI: 10.1371/journal.pone.0013431] [Citation(s) in RCA: 122] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2010] [Accepted: 09/10/2010] [Indexed: 12/04/2022] Open

Abstract

Background

In spite of extensive research on the effect of mutation and selection on codon usage, a general model of codon usage bias due to mutational bias has been lacking. Because most amino acids allow synonymous GC content changing substitutions in the third codon position, the overall GC bias of a genome or genomic region is highly correlated with GC3, a measure of third position GC content. For individual amino acids as well, G/C ending codons usage generally increases with increasing GC bias and decreases with increasing AT bias. Arginine and leucine, amino acids that allow GC-changing synonymous substitutions in the first and third codon positions, have codons which may be expected to show different usage patterns.

Principal Findings

In analyzing codon usage bias in hundreds of prokaryotic and plant genomes and in human genes, we find that two G-ending codons, AGG (arginine) and TTG (leucine), unlike all other G/C-ending codons, show overall usage that decreases with increasing GC bias, contrary to the usual expectation that G/C-ending codon usage should increase with increasing genomic GC bias. Moreover, the usage of some codons appears nonlinear, even nonmonotone, as a function of GC bias. To explain these observations, we propose a continuous-time Markov chain model of GC-biased synonymous substitution. This model correctly predicts the qualitative usage patterns of all codons, including nonlinear codon usage in isoleucine, arginine and leucine. The model accounts for 72%, 64% and 52% of the observed variability of codon usage in prokaryotes, plants and human respectively. When codons are grouped based on common GC content, 87%, 80% and 68% of the variation in usage is explained for prokaryotes, plants and human respectively.

Conclusions

The model clarifies the sometimes-counterintuitive effects that GC mutational bias can have on codon usage, quantifies the influence of GC mutational bias and provides a natural null model relative to which other influences on codon bias may be measured.

Collapse

Troukhan M, Tatarinova T, Bouck J, Flavell RB, Alexandrov NN. Genome-wide discovery of cis-elements in promoter sequences using gene expression. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2010;13:139-51. [PMID: 19231992 DOI: 10.1089/omi.2008.0034] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Tatarinova TV, Alexandrov NN, Bouck JB, Feldmann KA. GC3 biology in corn, rice, sorghum and other grasses. BMC Genomics 2010;11:308. [PMID: 20470436 PMCID: PMC2895627 DOI: 10.1186/1471-2164-11-308] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Accepted: 05/16/2010] [Indexed: 11/10/2022] Open

Civán P, Svec M. Genome-wide analysis of rice (Oryza sativa L. subsp. japonica) TATA box and Y Patch promoter elements. Genome 2009;52:294-7. [PMID: 19234558 DOI: 10.1139/g09-001] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Alexandrov NN, Brover VV, Freidin S, Troukhan ME, Tatarinova TV, Zhang H, Swaller TJ, Lu YP, Bouck J, Flavell RB, Feldmann KA. Insights into corn genes derived from large-scale cDNA sequencing. PLANT MOLECULAR BIOLOGY 2009;69:179-94. [PMID: 18937034 PMCID: PMC2709227 DOI: 10.1007/s11103-008-9415-4] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Accepted: 10/01/2008] [Indexed: 05/19/2023]

Abstract

We present a large portion of the transcriptome of Zea mays, including ESTs representing 484,032 cDNA clones from 53 libraries and 36,565 fully sequenced cDNA clones, out of which 31,552 clones are non-redundant. These and other previously sequenced transcripts have been aligned with available genome sequences and have provided new insights into the characteristics of gene structures and promoters within this major crop species. We found that although the average number of introns per gene is about the same in corn and Arabidopsis, corn genes have more alternatively spliced isoforms. Examination of the nucleotide composition of coding regions reveals that corn genes, as well as genes of other Poaceae (Grass family), can be divided into two classes according to the GC content at the third position in the amino acid encoding codons. Many of the transcripts that have lower GC content at the third position have dicot homologs but the high GC content transcripts tend to be more specific to the grasses. The high GC content class is also enriched with intronless genes. Together this suggests that an identifiable class of genes in plants is associated with the Poaceae divergence. Furthermore, because many of these genes appear to be derived from ancestral genes that do not contain introns, this evolutionary divergence may be the result of horizontal gene transfer from species not only with different codon usage but possibly that did not have introns, perhaps outside of the plant kingdom. By comparing the cDNAs described herein with the non-redundant set of corn mRNAs in GenBank, we estimate that there are about 50,000 different protein coding genes in Zea. All of the sequence data from this study have been submitted to DDBJ/GenBank/EMBL under accession numbers EU940701-EU977132 (FLI cDNA) and FK944382-FL482108 (EST).

Collapse

Yamamoto YY, Ichida H, Abe T, Suzuki Y, Sugano S, Obokata J. Differentiation of core promoter architecture between plants and mammals revealed by LDSS analysis. Nucleic Acids Res 2007;35:6219-26. [PMID: 17855401 PMCID: PMC2094075 DOI: 10.1093/nar/gkm685] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Yamamoto YY, Ichida H, Matsui M, Obokata J, Sakurai T, Satou M, Seki M, Shinozaki K, Abe T. Identification of plant promoter constituents by analysis of local distribution of short sequences. BMC Genomics 2007;8:67. [PMID: 17346352 PMCID: PMC1832190 DOI: 10.1186/1471-2164-8-67] [Citation(s) in RCA: 110] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2006] [Accepted: 03/08/2007] [Indexed: 11/20/2022] Open

Abstract

Background

Plant promoter architecture is important for understanding regulation and evolution of the promoters, but our current knowledge about plant promoter structure, especially with respect to the core promoter, is insufficient. Several promoter elements including TATA box, and several types of transcriptional regulatory elements have been found to show local distribution within promoters, and this feature has been successfully utilized for extraction of promoter constituents from human genome.

Results

LDSS (Local Distribution of Short Sequences) profiles of short sequences along the plant promoter have been analyzed in silico, and hundreds of hexamer and octamer sequences have been identified as having localized distributions within promoters of Arabidopsis thaliana and rice. Based on their localization patterns, the identified sequences could be classified into three groups, pyrimidine patch (Y Patch), TATA box, and REG (Regulatory Element Group). Sequences of the TATA box group are consistent with the ones reported in previous studies. The REG group includes more than 200 sequences, and half of them correspond to known cis-elements. The other REG subgroups, together with about a hundred uncategorized sequences, are suggested to be novel cis-regulatory elements. Comparison of LDSS-positive sequences between Arabidopsis and rice has revealed moderate conservation of elements and common promoter architecture. In addition, a dimer motif named the YR Rule (C/T A/G) has been identified at the transcription start site (-1/+1). This rule also fits both Arabidopsis and rice promoters.

Conclusion

LDSS was successfully applied to plant genomes and hundreds of putative promoter elements have been extracted as LDSS-positive octamers. Identified promoter architecture of monocot and dicot are well conserved, but there are moderate variations in the utilized sequences.

Collapse

Paz A, Mester D, Nevo E, Korol A. Looking for organization patterns of highly expressed genes: purine-pyrimidine composition of precursor mRNAs. J Mol Evol 2007;64:248-60. [PMID: 17211550 DOI: 10.1007/s00239-006-0135-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2006] [Accepted: 11/19/2006] [Indexed: 01/05/2023]

Schneeberger RG, Zhang K, Tatarinova T, Troukhan M, Kwok SF, Drais J, Klinger K, Orejudos F, Macy K, Bhakta A, Burns J, Subramanian G, Donson J, Flavell R, Feldmann KA. Agrobacterium T-DNA integration in Arabidopsis is correlated with DNA sequence compositions that occur frequently in gene promoter regions. Funct Integr Genomics 2005;5:240-53. [PMID: 15744539 DOI: 10.1007/s10142-005-0138-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2004] [Revised: 12/30/2004] [Accepted: 01/01/2005] [Indexed: 10/25/2022]

Fujimori S, Washio T, Tomita M. GC-compositional strand bias around transcription start sites in plants and fungi. BMC Genomics 2005;6:26. [PMID: 15733327 PMCID: PMC555766 DOI: 10.1186/1471-2164-6-26] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2004] [Accepted: 02/28/2005] [Indexed: 12/02/2022] Open

Abstract

BACKGROUND

A GC-compositional strand bias or GC-skew (=(C-G)/(C+G)), where C and G denote the numbers of cytosine and guanine residues, was recently reported near the transcription start sites (TSS) of Arabidopsis genes. However, it is unclear whether other eukaryotic species have equally prominent GC-skews, and the biological meaning of this trait remains unknown.

RESULTS

Our study confirmed a significant GC-skew (C > G) in the TSS of Oryza sativa (rice) genes. The full-length cDNAs and genomic sequences from Arabidopsis and rice were compared using statistical analyses. Despite marked differences in the G+C content around the TSS in the two plants, the degrees of bias were almost identical. Although slight GC-skew peaks, including opposite skews (C < G), were detected around the TSS of genes in human and Drosophila, they were qualitatively and quantitatively different from those identified in plants. However, plant-like GC-skew in regions upstream of the translation initiation sites (TIS) in some fungi was identified following analyses of the expressed sequence tags and/or genomic sequences from other species. On the basis of our dataset, we estimated that > 70 and 68% of Arabidopsis and rice genes, respectively, had a strong GC-skew (> 0.33) in a 100-bp window (that is, the number of C residues was more than double the number of G residues in a +/-100-bp window around the TSS). The mean GC-skew value in the TSS of highly-expressed genes in Arabidopsis was significantly greater than that of genes with low expression levels. Many of the GC-skew peaks were preferentially located near the TSS, so we examined the potential value of GC-skew as an index for TSS identification. Our results confirm that the GC-skew can be used to assist the TSS prediction in plant genomes.

CONCLUSION

The GC-skew (C > G) around the TSS is strictly conserved between monocot and eudicot plants (ie. angiosperms in general), and a similar skew has been observed in some fungi. Highly-expressed Arabidopsis genes had overall a more marked GC-skew in the TSS compared to genes with low expression levels. We therefore propose that the GC-skew around the TSS in some plants and fungi is related to transcription. It might be caused by mutations during transcription initiation or the frequent use of transcription factor-biding sites having a strand preference. In addition, GC-skew is a good candidate index for TSS prediction in plant genomes, where there is a lack of correlation among CpG islands and genes.

Collapse

Aerts S, Thijs G, Dabrowski M, Moreau Y, De Moor B. Comprehensive analysis of the base composition around the transcription start site in Metazoa. BMC Genomics 2004;5:34. [PMID: 15171795 PMCID: PMC436054 DOI: 10.1186/1471-2164-5-34] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2004] [Accepted: 06/01/2004] [Indexed: 11/29/2022] Open