1
|
Aditama R, Tanjung ZA, Aprilyanto V, Sudania WM, Utomo C, Liwang T. Identification of oil palm cis-regulatory elements based on DNA free energy and single nucleotide polymorphism density. Comput Biol Chem 2023; 106:107931. [PMID: 37481844 DOI: 10.1016/j.compbiolchem.2023.107931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 06/29/2023] [Accepted: 07/17/2023] [Indexed: 07/25/2023]
Abstract
Transcription control through cis-regulatory elements (CREs) is one of important regulators of gene expression. This study aimed to identify the location of CREs in oil palm (Elaeis guineensis Jacq.) using the combination of DNA free energy and single nucleotide polymorphism (SNP) density approaches. Promoter region sequences were extracted oil palm genome spanning from 1500 nucleotides (nt) upstream to 1000 nt downstream of every annotated transcription start sites (TSS). Free energy profiles of each promoter region were calculated using PromPredict software. Raw reads from the deep sequencing of 59 oil palm origins were used to calculate SNP density of each promoter region. The result showed that the average free energy (AFE) on the upstream region of TSS is about 1.5 kcal/mol higher compared to the downstream region. Using DNA free energy method, 16,281 regions of CREs were predicted. Most of predicted CREs was located between 1 and 500 nt upstream of TSS. Anti-correlation pattern between free energy and SNP density was observed on the predicted regions of CREs. This anti-correlated pattern was also observed on an experimentally determined promoter of the oil palm metallothionein gene, EgMSP1. Considering the increasing use of promoter information on plant biotechnology, an easy and accurate promoter prediction using the combination of free energy and SNP density method could be recommended.
Collapse
Affiliation(s)
- Redi Aditama
- Biotechnology Department, Plant Production and Biotechnology Division, PT SMART Tbk., Bogor 16810, Indonesia
| | - Zulfikar Achmad Tanjung
- Biotechnology Department, Plant Production and Biotechnology Division, PT SMART Tbk., Bogor 16810, Indonesia
| | - Victor Aprilyanto
- Biotechnology Department, Plant Production and Biotechnology Division, PT SMART Tbk., Bogor 16810, Indonesia
| | - Widyartini Made Sudania
- Biotechnology Department, Plant Production and Biotechnology Division, PT SMART Tbk., Bogor 16810, Indonesia
| | - Condro Utomo
- Biotechnology Department, Plant Production and Biotechnology Division, PT SMART Tbk., Bogor 16810, Indonesia.
| | - Tony Liwang
- Biotechnology Department, Plant Production and Biotechnology Division, PT SMART Tbk., Bogor 16810, Indonesia
| |
Collapse
|
2
|
Xu H, Li C, Xu C, Zhang J. Chance promoter activities illuminate the origins of eukaryotic intergenic transcriptions. Nat Commun 2023; 14:1826. [PMID: 37005399 PMCID: PMC10067814 DOI: 10.1038/s41467-023-37610-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 03/23/2023] [Indexed: 04/04/2023] Open
Abstract
It is debated whether the pervasive intergenic transcription from eukaryotic genomes has functional significance or simply reflects the promiscuity of RNA polymerases. We approach this question by comparing chance promoter activities with the expression levels of intergenic regions in the model eukaryote Saccharomyces cerevisiae. We build a library of over 105 strains, each carrying a 120-nucleotide, chromosomally integrated, completely random sequence driving the potential transcription of a barcode. Quantifying the RNA concentration of each barcode in two environments reveals that 41-63% of random sequences have significant, albeit usually low, promoter activities. Therefore, even in eukaryotes, where the presence of chromatin is thought to repress transcription, chance transcription is prevalent. We find that only 1-5% of yeast intergenic transcriptions are unattributable to chance promoter activities or neighboring gene expressions, and these transcriptions exhibit higher-than-expected environment-specificity. These findings suggest that only a minute fraction of intergenic transcription is functional in yeast.
Collapse
Affiliation(s)
- Haiqing Xu
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Chuan Li
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
- Microsoft, Redmond, WA, USA
| | - Chuan Xu
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
- Bio-X Institutes, Shanghai Jiao Tong University, Shanghai, China
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
3
|
Evolutionary characteristics of intergenic transcribed regions indicate rare novel genes and widespread noisy transcription in the Poaceae. Sci Rep 2019; 9:12122. [PMID: 31431676 PMCID: PMC6702216 DOI: 10.1038/s41598-019-47797-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 07/19/2019] [Indexed: 01/19/2023] Open
Abstract
Extensive transcriptional activity occurring in intergenic regions of genomes has raised the question whether intergenic transcription represents the activity of novel genes or noisy expression. To address this, we evaluated cross-species and post-duplication sequence and expression conservation of intergenic transcribed regions (ITRs) in four Poaceae species. Among 43,301 ITRs across the four species, 34,460 (80%) are species-specific. ITRs found across species tend to be more divergent in expression and have more recent duplicates compared to annotated genes. To assess if ITRs are functional (under selection), machine learning models were established in Oryza sativa (rice) that could accurately distinguish between phenotype genes and pseudogenes (area under curve-receiver operating characteristic = 0.94). Based on the models, 584 (8%) and 4391 (61%) rice ITRs are classified as likely functional and nonfunctional with high confidence, respectively. ITRs with conserved expression and ancient retained duplicates, features that were not part of the model, are frequently classified as likely-functional, suggesting these characteristics could serve as pragmatic rules of thumb for identifying candidate sequences likely to be under selection. This study also provides a framework to identify novel genes using comparative transcriptomic data to improve genome annotation that is fundamental for connecting genotype to phenotype in crop and model systems.
Collapse
|
4
|
Lloyd JP, Tsai ZTY, Sowers RP, Panchy NL, Shiu SH. A Model-Based Approach for Identifying Functional Intergenic Transcribed Regions and Noncoding RNAs. Mol Biol Evol 2019; 35:1422-1436. [PMID: 29554332 DOI: 10.1093/molbev/msy035] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
With advances in transcript profiling, the presence of transcriptional activities in intergenic regions has been well established. However, whether intergenic expression reflects transcriptional noise or activity of novel genes remains unclear. We identified intergenic transcribed regions (ITRs) in 15 diverse flowering plant species and found that the amount of intergenic expression correlates with genome size, a pattern that could be expected if intergenic expression is largely nonfunctional. To further assess the functionality of ITRs, we first built machine learning models using Arabidopsis thaliana as a model that accurately distinguish functional sequences (benchmark protein-coding and RNA genes) and likely nonfunctional ones (pseudogenes and unexpressed intergenic regions) by integrating 93 biochemical, evolutionary, and sequence-structure features. Next, by applying the models genome-wide, we found that 4,427 ITRs (38%) and 796 annotated ncRNAs (44%) had features significantly similar to benchmark protein-coding or RNA genes and thus were likely parts of functional genes. Approximately 60% of ITRs and ncRNAs were more similar to nonfunctional sequences and were likely transcriptional noise. The predictive framework established here provides not only a comprehensive look at how functional, genic sequences are distinct from likely nonfunctional ones, but also a new way to differentiate novel genes from genomic regions with noisy transcriptional activities.
Collapse
Affiliation(s)
- John P Lloyd
- Department of Plant Biology, Michigan State University, East Lansing, MI
| | - Zing Tsung-Yeh Tsai
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI
| | - Rosalie P Sowers
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA
| | | | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, MI.,Genetics Program, Michigan State University, East Lansing, MI.,Ecology, Evolutionary Biology, and Behavior Program, Michigan State University, East Lansing, MI
| |
Collapse
|
5
|
Abstract
The idea that much of our genome is irrelevant to fitness-is not the product of positive natural selection at the organismal level-remains viable. Claims to the contrary, and specifically that the notion of "junk DNA" should be abandoned, are based on conflating meanings of the word "function". Recent estimates suggest that perhaps 90% of our DNA, though biochemically active, does not contribute to fitness in any sequence-dependent way, and possibly in no way at all. Comparisons to vertebrates with much larger and smaller genomes (the lungfish and the pufferfish) strongly align with such a conclusion, as they have done for the last half-century.
Collapse
Affiliation(s)
- W Ford Doolittle
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada.
| | - Tyler D P Brunet
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of History and Philosophy of Science, University of Cambridge, Cambridge, UK
| |
Collapse
|