1
|
Knoebel E, Brinck A, Nonet ML. Parameters that influence bipartite reporter system expression in Caenorhabditis elegans. Genetics 2025:iyaf076. [PMID: 40341369 DOI: 10.1093/genetics/iyaf076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 04/08/2025] [Indexed: 05/10/2025] Open
Abstract
The development of bipartite reporter systems in Caenorhabditis elegans has lagged by more than a decade behind its adoption in Drosophila, the other invertebrate model commonly used to dissect biological mechanisms. Here, we characterize many parameters that influence expression in recently developed C. elegans bipartite systems. We examine how DNA binding site number and spacing influence expression and characterize how these expression parameters vary in distinct tissue types. Furthermore, we examine how both basal promoters and 3' UTR influence the specificity and level of expression. These studies provide both a framework for the rational design of driver and reporter transgenes and molecular and genetic tools for the creation, characterization, and optimization of bipartite system components for expression in other cell types.
Collapse
Affiliation(s)
- Emma Knoebel
- Department of Neuroscience, Washington University Medical School, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Anna Brinck
- Department of Neuroscience, Washington University Medical School, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Michael L Nonet
- Department of Neuroscience, Washington University Medical School, Washington University in St. Louis, St. Louis, MO 63110, USA
| |
Collapse
|
2
|
Gao P, Zhao Y, Xu G, Zhong Y, Sun C. Unique features of conventional and nonconventional introns in Euglena gracilis. BMC Genomics 2024; 25:595. [PMID: 38872102 PMCID: PMC11170887 DOI: 10.1186/s12864-024-10495-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 06/05/2024] [Indexed: 06/15/2024] Open
Abstract
BACKGROUND Nuclear introns in Euglenida have been understudied. This study aimed to investigate nuclear introns in Euglenida by identifying a large number of introns in Euglena gracilis (E. gracilis), including cis-spliced conventional and nonconventional introns, as well as trans-spliced outrons. We also examined the sequence characteristics of these introns. RESULTS A total of 28,337 introns and 11,921 outrons were identified. Conventional and nonconventional introns have distinct splice site features; the former harbour canonical GT/C-AG splice sites, whereas the latter are capable of forming structured motifs with their terminal sequences. We observed that short introns had a preference for canonical GT-AG introns. Notably, conventional introns and outrons in E. gracilis exhibited a distinct cytidine-rich polypyrimidine tract, in contrast to the thymidine-rich tracts observed in other organisms. Furthermore, the SL-RNAs in E. gracilis, as well as in other trans-splicing species, can form a recently discovered motif called the extended U6/5' ss duplex with the respective U6s. We also describe a novel type of alternative splicing pattern in E. gracilis. The tandem repeat sequences of introns in this protist were determined, and their contents were comparable to those in humans. CONCLUSIONS Our findings highlight the unique features of E. gracilis introns and provide insights into the splicing mechanism of these introns, as well as the genomics and evolution of Euglenida.
Collapse
Affiliation(s)
- Pingwei Gao
- Scientific Research Center, Chengdu Medical College, Chengdu, 610500, China
| | - Yali Zhao
- Scientific Research Center, Chengdu Medical College, Chengdu, 610500, China
| | - Guangjie Xu
- Scientific Research Center, Chengdu Medical College, Chengdu, 610500, China
| | - Yujie Zhong
- Clinical Laboratory Department, Zigong Hospital of Women's and Children's Healthcare, Zigong, 643002, China.
| | - Chengfu Sun
- Scientific Research Center, Chengdu Medical College, Chengdu, 610500, China.
| |
Collapse
|
3
|
Cassart C, Yague-Sanz C, Bauer F, Ponsard P, Stubbe FX, Migeot V, Wery M, Morillon A, Palladino F, Robert V, Hermand D. RNA polymerase II CTD S2P is dispensable for embryogenesis but mediates exit from developmental diapause in C. elegans. SCIENCE ADVANCES 2020; 6:6/50/eabc1450. [PMID: 33298437 PMCID: PMC7725455 DOI: 10.1126/sciadv.abc1450] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 10/21/2020] [Indexed: 06/12/2023]
Abstract
Serine 2 phosphorylation (S2P) within the CTD of RNA polymerase II is considered a Cdk9/Cdk12-dependent mark required for 3'-end processing. However, the relevance of CTD S2P in metazoan development is unknown. We show that cdk-12 lesions or a full-length CTD S2A substitution results in an identical phenotype in Caenorhabditis elegans Embryogenesis occurs in the complete absence of S2P, but the hatched larvae arrest development, mimicking the diapause induced when hatching occurs in the absence of food. Genome-wide analyses indicate that when CTD S2P is inhibited, only a subset of growth-related genes is not properly expressed. These genes correspond to SL2 trans-spliced mRNAs located in position 2 and over within operons. We show that CDK-12 is required for maximal occupancy of cleavage stimulatory factor necessary for SL2 trans-splicing. We propose that CTD S2P functions as a gene-specific signaling mark ensuring the nutritional control of the C. elegans developmental program.
Collapse
Affiliation(s)
- C Cassart
- URPHYM-GEMO, The University of Namur, rue de Bruxelles, 61, Namur 5000 Belgium
| | - C Yague-Sanz
- URPHYM-GEMO, The University of Namur, rue de Bruxelles, 61, Namur 5000 Belgium
| | - F Bauer
- URPHYM-GEMO, The University of Namur, rue de Bruxelles, 61, Namur 5000 Belgium
| | - P Ponsard
- URPHYM-GEMO, The University of Namur, rue de Bruxelles, 61, Namur 5000 Belgium
| | - F X Stubbe
- URPHYM-GEMO, The University of Namur, rue de Bruxelles, 61, Namur 5000 Belgium
| | - V Migeot
- URPHYM-GEMO, The University of Namur, rue de Bruxelles, 61, Namur 5000 Belgium
| | - M Wery
- ncRNA, epigenetic and genome fluidity, Institut Curie, PSL Research University, CNRS UMR 3244, Université Pierre et Marie Curie, Paris, France
| | - A Morillon
- ncRNA, epigenetic and genome fluidity, Institut Curie, PSL Research University, CNRS UMR 3244, Université Pierre et Marie Curie, Paris, France
| | - F Palladino
- Laboratory of Biology and Modeling of the Cell, UMR5239 CNRS/Ecole Normale Supérieure de Lyon, INSERM U1210, UMS 3444 Biosciences Lyon Gerland, Université de Lyon, Lyon, France
| | - V Robert
- Laboratory of Biology and Modeling of the Cell, UMR5239 CNRS/Ecole Normale Supérieure de Lyon, INSERM U1210, UMS 3444 Biosciences Lyon Gerland, Université de Lyon, Lyon, France
| | - D Hermand
- URPHYM-GEMO, The University of Namur, rue de Bruxelles, 61, Namur 5000 Belgium.
| |
Collapse
|
4
|
Long-read RNA sequencing of human and animal filarial parasites improves gene models and discovers operons. PLoS Negl Trop Dis 2020; 14:e0008869. [PMID: 33196647 PMCID: PMC7704054 DOI: 10.1371/journal.pntd.0008869] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 11/30/2020] [Accepted: 10/09/2020] [Indexed: 01/01/2023] Open
Abstract
Filarial parasitic nematodes (Filarioidea) cause substantial disease burden to humans and animals around the world. Recently there has been a coordinated global effort to generate, annotate, and curate genomic data from nematode species of medical and veterinary importance. This has resulted in two chromosome-level assemblies (Brugia malayi and Onchocerca volvulus) and 11 additional draft genomes from Filarioidea. These reference assemblies facilitate comparative genomics to explore basic helminth biology and prioritize new drug and vaccine targets. While the continual improvement of genome contiguity and completeness advances these goals, experimental functional annotation of genes is often hindered by poor gene models. Short-read RNA sequencing data and expressed sequence tags, in cooperation with ab initio prediction algorithms, are employed for gene prediction, but these can result in missing clade-specific genes, fragmented models, imperfect mapping of gene ends, and lack of isoform resolution. Long-read RNA sequencing can overcome these drawbacks and greatly improve gene model quality. Here, we present Iso-Seq data for B. malayi and Dirofilaria immitis, etiological agents of lymphatic filariasis and canine heartworm disease, respectively. These data cover approximately half of the known coding genomes and substantially improve gene models by extending untranslated regions, cataloging novel splice junctions from novel isoforms, and correcting mispredicted junctions. Furthermore, we validated computationally predicted operons, manually curated new operons, and merged fragmented gene models. We carried out analyses of poly(A) tails in both species, leading to the identification of non-canonical poly(A) signals. Finally, we prioritized and assessed known and putative anthelmintic targets, correcting or validating gene models for molecular cloning and target-based anthelmintic screening efforts. Overall, these data significantly improve the catalog of gene models for two important parasites, and they demonstrate how long-read RNA sequencing should be prioritized for ongoing improvement of parasitic nematode genome assemblies. Filarial parasitic nematodes are vector-borne parasites that infect humans and animals. Brugia malayi and Dirofilaria immitis are transmitted by mosquitoes and cause human lymphatic filariasis and canine heartworm disease, respectively. Recent years have seen a dramatic increase in genomic and transcriptomic data sets and the concomitant increase in innovative strategies for drug target identification, validation, and screening. However, while the completeness of genome assemblies of filarial parasitic nematodes has seen steady improvements, the reliability of gene models has not kept pace, hindering cloning efforts. Long-read RNA sequencing technologies are uniquely able to improve gene models, but have not been widely used for the causative agents of neglected tropical diseases. Here, we report the improvement of gene models in both B. malayi and D. immitis by long-read RNA sequencing. We identified novel operons, deprecated false positive operons, identified dozens of novel genes, and described the parameters of polyadenylation. We also focused on putative anthelmintic targets, identifying novel isoforms and correcting gene models. These data substantially increase the trustworthiness of gene models in these two species and demonstrate how long-read sequencing approaches should be prioritized in the continued improvement of genome assemblies and their gene annotations.
Collapse
|
5
|
Arribere JA, Kuroyanagi H, Hundley HA. mRNA Editing, Processing and Quality Control in Caenorhabditis elegans. Genetics 2020; 215:531-568. [PMID: 32632025 PMCID: PMC7337075 DOI: 10.1534/genetics.119.301807] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 05/03/2020] [Indexed: 02/06/2023] Open
Abstract
While DNA serves as the blueprint of life, the distinct functions of each cell are determined by the dynamic expression of genes from the static genome. The amount and specific sequences of RNAs expressed in a given cell involves a number of regulated processes including RNA synthesis (transcription), processing, splicing, modification, polyadenylation, stability, translation, and degradation. As errors during mRNA production can create gene products that are deleterious to the organism, quality control mechanisms exist to survey and remove errors in mRNA expression and processing. Here, we will provide an overview of mRNA processing and quality control mechanisms that occur in Caenorhabditis elegans, with a focus on those that occur on protein-coding genes after transcription initiation. In addition, we will describe the genetic and technical approaches that have allowed studies in C. elegans to reveal important mechanistic insight into these processes.
Collapse
Affiliation(s)
| | - Hidehito Kuroyanagi
- Laboratory of Gene Expression, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan, and
| | - Heather A Hundley
- Medical Sciences Program, Indiana University School of Medicine-Bloomington, Indiana 47405
| |
Collapse
|
6
|
Nelson C, Ambros V. Trans-splicing of the C. elegans let-7 primary transcript developmentally regulates let-7 microRNA biogenesis and let-7 family microRNA activity. Development 2019; 146:dev172031. [PMID: 30770392 PMCID: PMC6432665 DOI: 10.1242/dev.172031] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 02/11/2019] [Indexed: 12/19/2022]
Abstract
The sequence and roles in developmental progression of the microRNA let-7 are conserved. In general, transcription of the let-7 primary transcript (pri-let-7) occurs early in development, whereas processing of the mature let-7 microRNA arises during cellular differentiation. In Caenorhabditiselegans and other animals, the RNA-binding protein LIN-28 post-transcriptionally inhibits let-7 biogenesis at early developmental stages, but the mechanisms by which LIN-28 does this are not fully understood. Nor is it understood how the developmental regulation of let-7 might influence the expression or activities of other microRNAs of the same seed family. Here, we show that pri-let-7 is trans-spliced to the SL1 splice leader downstream of the let-7 precursor stem-loop, which produces a short polyadenylated downstream mRNA, and that this trans-splicing event negatively impacts the biogenesis of mature let-7 microRNA in cis Moreover, this trans-spliced mRNA contains sequences that are complementary to multiple members of the let-7 seed family (let-7fam) and negatively regulates let-7fam function in trans Thus, this study provides evidence for a mechanism by which splicing of a microRNA primary transcript can negatively regulate said microRNA in cis as well as other microRNAs in trans.
Collapse
Affiliation(s)
- Charles Nelson
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Victor Ambros
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA
| |
Collapse
|
7
|
Yague-Sanz C, Hermand D. SL-quant: a fast and flexible pipeline to quantify spliced leader trans-splicing events from RNA-seq data. Gigascience 2018; 7:5052207. [PMID: 30010768 PMCID: PMC6055573 DOI: 10.1093/gigascience/giy084] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 06/04/2018] [Accepted: 07/01/2018] [Indexed: 11/13/2022] Open
Abstract
Background The spliceosomal transfer of a short spliced leader (SL) RNA to an independent pre-mRNA molecule is called SL trans-splicing and is widespread in the nematode Caenorhabditis elegans. While RNA-sequencing (RNA-seq) data contain information on such events, properly documented methods to extract them are lacking. Findings To address this, we developed SL-quant, a fast and flexible pipeline that adapts to paired-end and single-end RNA-seq data and accurately quantifies SL trans-splicing events. It is designed to work downstream of read mapping and uses the reads left unmapped as primary input. Briefly, the SL sequences are identified with high specificity and are trimmed from the input reads, which are then remapped on the reference genome and quantified at the nucleotide position level (SL trans-splice sites) or at the gene level. Conclusions SL-quant completes within 10 minutes on a basic desktop computer for typical C. elegans RNA-seq datasets and can be applied to other species as well. Validating the method, the SL trans-splice sites identified display the expected consensus sequence, and the results of the gene-level quantification are predictive of the gene position within operons. We also compared SL-quant to a recently published SL-containing read identification strategy that was found to be more sensitive but less specific than SL-quant. Both methods are implemented as a bash script available under the MIT license [1]. Full instructions for its installation, usage, and adaptation to other organisms are provided.
Collapse
Affiliation(s)
- Carlo Yague-Sanz
- URPhyM-GEMO, The University of Namur (UNamur), 61 rue de Bruxelles, 5000 Namur, Belgium
| | - Damien Hermand
- URPhyM-GEMO, The University of Namur (UNamur), 61 rue de Bruxelles, 5000 Namur, Belgium
| |
Collapse
|
8
|
Garrido-Lecca A, Saldi T, Blumenthal T. Localization of RNAPII and 3' end formation factor CstF subunits on C. elegans genes and operons. Transcription 2016; 7:96-110. [PMID: 27124504 DOI: 10.1080/21541264.2016.1168509] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Transcription termination is mechanistically coupled to pre-mRNA 3' end formation to prevent transcription much beyond the gene 3' end. C. elegans, however, engages in polycistronic transcription of operons in which 3' end formation between genes is not accompanied by termination. We have performed RNA polymerase II (RNAPII) and CstF ChIP-seq experiments to investigate at a genome-wide level how RNAPII can transcribe through multiple poly-A signals without causing termination. Our data shows that transcription proceeds in some ways as if operons were composed of multiple adjacent single genes. Total RNAPII shows a small peak at the promoter of the gene cluster and a much larger peak at 3' ends. These 3' peaks coincide with maximal phosphorylation of Ser2 within the C-terminal domain (CTD) of RNAPII and maximal localization of the 3' end formation factor CstF. This pattern occurs at all 3' ends including those at internal sites in operons where termination does not occur. Thus the normal mechanism of 3' end formation does not always result in transcription termination. Furthermore, reduction of CstF50 by RNAi did not substantially alter the pattern of CstF64, total RNAPII, or Ser2 phosphorylation at either internal or terminal 3' ends. However, CstF50 RNAi did result in a subtle reduction of CstF64 binding upstream of the site of 3' cleavage, suggesting that the CstF50/CTD interaction may facilitate bringing the 3' end machinery to the transcription complex.
Collapse
Affiliation(s)
- Alfonso Garrido-Lecca
- a Department of Molecular, Cellular, and Developmental Biology , University of Colorado , Boulder , CO , USA
| | - Tassa Saldi
- a Department of Molecular, Cellular, and Developmental Biology , University of Colorado , Boulder , CO , USA
| | - Thomas Blumenthal
- a Department of Molecular, Cellular, and Developmental Biology , University of Colorado , Boulder , CO , USA
| |
Collapse
|
9
|
Pettitt J, Philippe L, Sarkar D, Johnston C, Gothe HJ, Massie D, Connolly B, Müller B. Operons are a conserved feature of nematode genomes. Genetics 2014; 197:1201-11. [PMID: 24931407 PMCID: PMC4125394 DOI: 10.1534/genetics.114.162875] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2014] [Accepted: 06/06/2014] [Indexed: 01/09/2023] Open
Abstract
The organization of genes into operons, clusters of genes that are co-transcribed to produce polycistronic pre-mRNAs, is a trait found in a wide range of eukaryotic groups, including multiple animal phyla. Operons are present in the class Chromadorea, one of the two main nematode classes, but their distribution in the other class, the Enoplea, is not known. We have surveyed the genomes of Trichinella spiralis, Trichuris muris, and Romanomermis culicivorax and identified the first putative operons in members of the Enoplea. Consistent with the mechanism of polycistronic RNA resolution in other nematodes, the mRNAs produced by genes downstream of the first gene in the T. spiralis and T. muris operons are trans-spliced to spliced leader RNAs, and we are able to detect polycistronic RNAs derived from these operons. Importantly, a putative intercistronic region from one of these potential enoplean operons confers polycistronic processing activity when expressed as part of a chimeric operon in Caenorhabditis elegans. We find that T. spiralis genes located in operons have an increased likelihood of having operonic C. elegans homologs. However, operon structure in terms of synteny and gene content is not tightly conserved between the two taxa, consistent with models of operon evolution. We have nevertheless identified putative operons conserved between Enoplea and Chromadorea. Our data suggest that operons and "spliced leader" (SL) trans-splicing predate the radiation of the nematode phylum, an inference which is supported by the phylogenetic profile of proteins known to be involved in nematode SL trans-splicing.
Collapse
Affiliation(s)
- Jonathan Pettitt
- School of Medical Sciences, Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, United Kingdom
| | - Lucas Philippe
- School of Medical Sciences, Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, United Kingdom
| | - Debjani Sarkar
- School of Medical Sciences, Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, United Kingdom
| | - Christopher Johnston
- School of Medical Sciences, Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, United Kingdom
| | - Henrike Johanna Gothe
- School of Medical Sciences, Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, United Kingdom
| | - Diane Massie
- School of Medical Sciences, Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, United Kingdom
| | - Bernadette Connolly
- School of Medical Sciences, Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, United Kingdom
| | - Berndt Müller
- School of Medical Sciences, Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, United Kingdom
| |
Collapse
|
10
|
Abstract
Systemic response to DNA damage and other stresses is a complex process that includes changes in the regulation and activity of nearly all stages of gene expression. One gene regulatory mechanism used by eukaryotes is selection among alternative transcript isoforms that differ in polyadenylation [poly(A)] sites, resulting in changes either to the coding sequence or to portions of the 3' UTR that govern translation, stability, and localization. To determine the extent to which this means of regulation is used in response to DNA damage, we conducted a global analysis of poly(A) site usage in Saccharomyces cerevisiae after exposure to the UV mimetic, 4-nitroquinoline 1-oxide (4NQO). Two thousand thirty-one genes were found to have significant variation in poly(A) site distributions following 4NQO treatment, with a strong bias toward loss of short transcripts, including many with poly(A) sites located within the protein coding sequence (CDS). We further explored one possible mechanism that could contribute to the widespread differences in mRNA isoforms. The change in poly(A) site profile was associated with an inhibition of cleavage and polyadenylation in cell extract and a decrease in the levels of several key subunits in the mRNA 3'-end processing complex. Sequence analysis identified differences in the cis-acting elements that flank putatively suppressed and enhanced poly(A) sites, suggesting a mechanism that could discriminate between variable and constitutive poly(A) sites. Our analysis indicates that variation in mRNA length is an important part of the regulatory response to DNA damage.
Collapse
|
11
|
Reinke V, Krause M, Okkema P. Transcriptional regulation of gene expression in C. elegans. ACTA ACUST UNITED AC 2013:1-34. [PMID: 23801596 DOI: 10.1895/wormbook.1.45.2] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Protein coding gene sequences are converted to mRNA by the highly regulated process of transcription. The precise temporal and spatial control of transcription for many genes is an essential part of development in metazoans. Thus, understanding the molecular mechanisms underlying transcriptional control is essential to understanding cell fate determination during embryogenesis, post-embryonic development, many environmental interactions, and disease-related processes. Studies of transcriptional regulation in C. elegans exploit its genomic simplicity and physical characteristics to define regulatory events with single-cell and minute-time-scale resolution. When combined with the genetics of the system, C. elegans offers a unique and powerful vantage point from which to study how chromatin-associated proteins and their modifications interact with transcription factors and their binding sites to yield precise control of gene expression through transcriptional regulation.
Collapse
Affiliation(s)
- Valerie Reinke
- Department of Genetics, Yale University, New Haven, CT 06520, USA.
| | | | | |
Collapse
|
12
|
Tian B, Graber JH. Signals for pre-mRNA cleavage and polyadenylation. WILEY INTERDISCIPLINARY REVIEWS-RNA 2011; 3:385-96. [PMID: 22012871 DOI: 10.1002/wrna.116] [Citation(s) in RCA: 172] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Pre-mRNA cleavage and polyadenylation is an essential step for 3' end formation of almost all protein-coding transcripts in eukaryotes. The reaction, involving cleavage of nascent mRNA followed by addition of a polyadenylate or poly(A) tail, is controlled by cis-acting elements in the pre-mRNA surrounding the cleavage site. Experimental and bioinformatic studies in the past three decades have elucidated conserved and divergent elements across eukaryotes, from yeast to human. Here we review histories and current models of these elements in a broad range of species.
Collapse
Affiliation(s)
- Bin Tian
- UMDNJ-New Jersey Medical School, Newark, NJ, USA.
| | | |
Collapse
|
13
|
Laing R, Hunt M, Protasio AV, Saunders G, Mungall K, Laing S, Jackson F, Quail M, Beech R, Berriman M, Gilleard JS. Annotation of two large contiguous regions from the Haemonchus contortus genome using RNA-seq and comparative analysis with Caenorhabditis elegans. PLoS One 2011; 6:e23216. [PMID: 21858033 PMCID: PMC3156134 DOI: 10.1371/journal.pone.0023216] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Accepted: 07/12/2011] [Indexed: 11/30/2022] Open
Abstract
The genomes of numerous parasitic nematodes are currently being sequenced, but their complexity and size, together with high levels of intra-specific sequence variation and a lack of reference genomes, makes their assembly and annotation a challenging task. Haemonchus contortus is an economically significant parasite of livestock that is widely used for basic research as well as for vaccine development and drug discovery. It is one of many medically and economically important parasites within the strongylid nematode group. This group of parasites has the closest phylogenetic relationship with the model organism Caenorhabditis elegans, making comparative analysis a potentially powerful tool for genome annotation and functional studies. To investigate this hypothesis, we sequenced two contiguous fragments from the H. contortus genome and undertook detailed annotation and comparative analysis with C. elegans. The adult H. contortus transcriptome was sequenced using an Illumina platform and RNA-seq was used to annotate a 409 kb overlapping BAC tiling path relating to the X chromosome and a 181 kb BAC insert relating to chromosome I. In total, 40 genes and 12 putative transposable elements were identified. 97.5% of the annotated genes had detectable homologues in C. elegans of which 60% had putative orthologues, significantly higher than previous analyses based on EST analysis. Gene density appears to be less in H. contortus than in C. elegans, with annotated H. contortus genes being an average of two-to-three times larger than their putative C. elegans orthologues due to a greater intron number and size. Synteny appears high but gene order is generally poorly conserved, although areas of conserved microsynteny are apparent. C. elegans operons appear to be partially conserved in H. contortus. Our findings suggest that a combination of RNA-seq and comparative analysis with C. elegans is a powerful approach for the annotation and analysis of strongylid nematode genomes.
Collapse
Affiliation(s)
- Roz Laing
- Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
- Faculty of Veterinary Medicine, University of Glasgow, Glasgow, Strathclyde, United Kingdom
| | - Martin Hunt
- Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Anna V. Protasio
- Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Gary Saunders
- Faculty of Veterinary Medicine, University of Glasgow, Glasgow, Strathclyde, United Kingdom
| | - Karen Mungall
- Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Steven Laing
- Faculty of Veterinary Medicine, University of Glasgow, Glasgow, Strathclyde, United Kingdom
| | - Frank Jackson
- Moredun Research Institute, Pentlands Science Park, Bush Loan, Penicuik, United Kingdom
| | - Michael Quail
- Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Robin Beech
- Institute of Parasitology, McGill University, Ste Anne de Bellevue, Quebec, Canada
| | - Matthew Berriman
- Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - John S. Gilleard
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- * E-mail:
| |
Collapse
|
14
|
Grishkevich V, Hashimshony T, Yanai I. Core promoter T-blocks correlate with gene expression levels in C. elegans. Genome Res 2011; 21:707-17. [PMID: 21367940 PMCID: PMC3083087 DOI: 10.1101/gr.113381.110] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2010] [Accepted: 02/17/2011] [Indexed: 02/01/2023]
Abstract
Core promoters mediate transcription initiation by the integration of diverse regulatory signals encoded in the proximal promoter and enhancers. It has been suggested that genes under simple regulation may have low-complexity permissive promoters. For these genes, the core promoter may serve as the principal regulatory element; however, the mechanism by which this occurs is unclear. We report here a periodic poly-thymine motif, which we term T-blocks, enriched in occurrences within core promoter forward strands in Caenorhabditis elegans. An increasing number of T-blocks on either strand is associated with increasing nucleosome eviction. Strikingly, only forward strand T-blocks are correlated with expression levels, whereby genes with ≥6 T-blocks have fivefold higher expression levels than genes with ≤3 T-blocks. We further demonstrate that differences in T-block numbers between strains predictably affect expression levels of orthologs. Highly expressed genes and genes in operons tend to have a large number of T-blocks, as well as the previously characterized SL1 motif involved in trans-splicing. The presence of T-blocks thus correlates with low nucleosome occupancy and the precision of a trans-splicing motif, suggesting its role at both the DNA and RNA levels. Collectively, our results suggest that core promoters may tune gene expression levels through the occurrences of T-blocks, independently of the spatio-temporal regulation mediated by the proximal promoter.
Collapse
Affiliation(s)
| | - Tamar Hashimshony
- Department of Biology, Technion–Israel Institute of Technology, Haifa 32000, Israel
| | - Itai Yanai
- Department of Biology, Technion–Israel Institute of Technology, Haifa 32000, Israel
| |
Collapse
|
15
|
Divergence in enzyme regulation between Caenorhabditis elegans and human tyrosine hydroxylase, the key enzyme in the synthesis of dopamine. Biochem J 2011; 434:133-41. [DOI: 10.1042/bj20101561] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
TH (tyrosine hydroxylase) is the rate-limiting enzyme in the synthesis of catecholamines. The cat-2 gene of the nematode Caenorhabditis elegans is expressed in mechanosensory dopaminergic neurons and has been proposed to encode a putative TH. In the present paper, we report the cloning of C. elegans full-length cat-2 cDNA and a detailed biochemical characterization of the encoded CAT-2 protein. Similar to other THs, C. elegans CAT-2 is composed of an N-terminal regulatory domain followed by a catalytic domain and a C-terminal oligomerization domain and shows high substrate specificity for L-tyrosine. Like hTH (human TH), CAT-2 is tetrameric and is phosphorylated at Ser35 (equivalent to Ser40 in hTH) by PKA (cAMP-dependent protein kinase). However, CAT-2 is devoid of characteristic regulatory mechanisms present in hTH, such as negative co-operativity for the cofactor, substrate inhibition or feedback inhibition exerted by catecholamines, end-products of the pathway. Thus TH activity in C. elegans displays a weaker regulation in comparison with the human orthologue, resembling a constitutively active enzyme. Overall, our data suggest that the intricate regulation characteristic of mammalian TH might have evolved from more simple models to adjust to the increasing complexity of the higher eukaryotes neuroendocrine systems.
Collapse
|
16
|
Abstract
Trans-splicing is the joining together of portions of two separate pre-mRNA molecules. The two distinct categories of spliceosomal trans-splicing are genic trans-splicing, which joins exons of different pre-mRNA transcripts, and spliced leader (SL) trans-splicing, which involves an exon donated from a specialized SL RNA. Both depend primarily on the same signals and components as cis-splicing. Genic trans-splicing events producing protein-coding mRNAs have been described in a variety of organisms, including Caenorhabditis elegans and Drosophila. In mammalian cells, genic trans-splicing can be associated with cancers and translocations. SL trans-splicing has mainly been studied in nematodes and trypanosomes, but there are now numerous and diverse phyla (including primitive chordates) where this type of trans-splicing has been detected. Such diversity raises questions as to the evolutionary origin of the process. Another intriguing question concerns the function of trans-splicing, as operon resolution can only account for a small proportion of the total amount of SL trans-splicing.
Collapse
Affiliation(s)
- Erika L Lasda
- University of Colorado Denver, Department of Biochemistry and Molecular Genetics; University of Colorado Boulder, Department of Molecular, Cellular, and Developmental Biology
| | | |
Collapse
|
17
|
|
18
|
Abstract
Spliced leader trans-splicing occurs in many primitive eukaryotes including nematodes. Most of our knowledge of trans-splicing in nematodes stems from the model organism Caenorhabditis elegans and relatives, and from work with Ascaris. Our investigation of spliced leader trans-splicing in distantly related Dorylaimia nematodes indicates that spliced-leader trans-splicing arose before the nematode phylum and suggests that the spliced leader RNA gene complements in extant nematodes have evolved from a common ancestor with a diverse set of spliced leader RNA genes.
Collapse
|
19
|
Lasda EL, Allen MA, Blumenthal T. Polycistronic pre-mRNA processing in vitro: snRNP and pre-mRNA role reversal in trans-splicing. Genes Dev 2010; 24:1645-58. [PMID: 20624853 DOI: 10.1101/gad.1940010] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Spliced leader (SL) trans-splicing in Caenorhabditis elegans attaches a 22-nucleotide (nt) exon onto the 5' end of many mRNAs. A particular class of SL, SL2, splices mRNAs of downstream operon genes. Here we use an embryonic extract-based in vitro splicing system to show that SL2 specificity information is encoded within the polycistronic pre-mRNA, and that trans-splicing specificity is recapitulated in vitro. We define an RNA sequence required for SL2 trans-splicing, the U-rich (Ur) element, through mutational analysis and bioinformatics as a short stem-loop followed by a sequence motif, UAYYUU, located approximately 50 nt upstream of the trans-splice site. Furthermore, this element is predicted in intercistronic regions of numerous operons of C. elegans and other species that use SL2 trans-splicing. We propose that the UAYYUU motif hybridizes with the 5' splice site on the SL2 RNA to recruit the SL to the pre-mRNA. In this way, the UAYYUU motif in the pre-mRNA would serve an analogous function to the similar sequence in the U1 snRNA, which binds to the 5' splice site of introns, effectively reversing the roles of snRNP and pre-mRNA in trans-splicing.
Collapse
Affiliation(s)
- Erika L Lasda
- Department of Biochemistry and Molecular Genetics, University of Colorado Denver, Anschutz Medical Campus, Aurora, Colorado 80045, USA
| | | | | |
Collapse
|
20
|
RNA polymerase II C-terminal domain phosphorylation patterns in Caenorhabditis elegans operons, polycistronic gene clusters with only one promoter. Mol Cell Biol 2010; 30:3887-93. [PMID: 20498277 DOI: 10.1128/mcb.00325-10] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The heptad repeat of the RNA polymerase II (RNAPII) C-terminal domain is phosphorylated at serine 5 near gene 5' ends and serine 2 near 3' ends in order to recruit pre-mRNA processing factors. Ser-5(P) is associated with gene 5' ends to recruit capping enzymes, whereas Ser-2(P) is associated with gene 3' ends to recruit cleavage and polyadenylation factors. In the gene clusters called operons in Caenorhabditis elegans, there is generally only a single promoter, but each gene in the operon forms a 3' end by the usual mechanism. Although downstream operon genes have 5' ends, they receive their caps by trans splicing rather than by capping enzymes. Thus, they are predicted to not need Ser-5 phosphorylation. Here we show by RNAPII chromatin immunoprecipitation (ChIP) that internal operon gene 5' ends do indeed lack Ser-5(P) peaks. In contrast, Ser-2(P) peaks occur at each mRNA 3' end, where the 3'-end formation machinery binds. These results provide additional support for the idea that the serine phosphorylation of the C-terminal domain (CTD) serves to bring RNA-processing enzymes to the transcription complex. Furthermore, these results provide a novel demonstration that genes in operons are cotranscribed from a single upstream promoter.
Collapse
|
21
|
Derelle R, Momose T, Manuel M, Da Silva C, Wincker P, Houliston E. Convergent origins and rapid evolution of spliced leader trans-splicing in metazoa: insights from the ctenophora and hydrozoa. RNA (NEW YORK, N.Y.) 2010; 16:696-707. [PMID: 20142326 PMCID: PMC2844618 DOI: 10.1261/rna.1975210] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2009] [Accepted: 12/23/2009] [Indexed: 05/20/2023]
Abstract
Replacement of mRNA 5' UTR sequences by short sequences trans-spliced from specialized, noncoding, spliced leader (SL) RNAs is an enigmatic phenomenon, occurring in a set of distantly related animal groups including urochordates, nematodes, flatworms, and hydra, as well as in Euglenozoa and dinoflagellates. Whether SL trans-splicing has a common evolutionary origin and biological function among different organisms remains unclear. We have undertaken a systematic identification of SL exons in cDNA sequence data sets from non-bilaterian metazoan species and their closest unicellular relatives. SL exons were identified in ctenophores and in hydrozoan cnidarians, but not in other cnidarians, placozoans, or sponges, or in animal unicellular relatives. Mapping of SL absence/presence obtained from this and previous studies onto current phylogenetic trees favors an evolutionary scenario involving multiple origins for SLs during eumetazoan evolution rather than loss from a common ancestor. In both ctenophore and hydrozoan species, multiple SL sequences were identified, showing high sequence diversity. Detailed analysis of a large data set generated for the hydrozoan Clytia hemisphaerica revealed trans-splicing of given mRNAs by multiple alternative SLs. No evidence was found for a common identity of trans-spliced mRNAs between different hydrozoans. One feature found specifically to characterize SL-spliced mRNAs in hydrozoans, however, was a marked adenosine enrichment immediately 3' of the SL acceptor splice site. Our findings of high sequence divergence and apparently indiscriminate use of SLs in hydrozoans, along with recent findings in other taxa, indicate that SL genes have evolved rapidly in parallel in diverse animal groups, with constraint on SL exon sequence evolution being apparently rare.
Collapse
Affiliation(s)
- Romain Derelle
- Biologie du Développement (UMR 7138) Observatoire Océanologique, Université Pierre et Marie Curie (UPMC-Univ Paris 06) and Centre National de la Recherche Scientifique (CNRS), 06230 Villefranche-sur-mer, France
| | | | | | | | | | | |
Collapse
|
22
|
Lee LW, Lo HW, Lo SJ. Vectors for co-expression of two genes in Caenorhabditis elegans. Gene 2010; 455:16-21. [PMID: 20149852 DOI: 10.1016/j.gene.2010.02.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2009] [Revised: 01/26/2010] [Accepted: 02/01/2010] [Indexed: 11/19/2022]
Abstract
To meet the increasing need of simultaneously co-expressing two different genes in the same cell of transgenic Caenorhabditis elegans, here, we report the establishment of dicistronic vectors that contain an intercistronic region (ICR) of the C. elegans operon, CEOP5428. In these vectors, a green fluorescence protein (GFP) and a red FP (RFP) genes were placed in the first and second cistrons, respectively, which were separated by the ICR. Driven by the fibrillarin (fib-1) or myo-2 promoter, the GFP- and RFP-fusion proteins were consistently co-expressed in the entire worm cells or in the pharynx muscle cells of the transgenic worms, respectively. Our work demonstrates that ICR-containing dicistronic vectors could be developed into versatile co-expression systems in C. elegans for functional analysis of genes of interest.
Collapse
Affiliation(s)
- Li-Wei Lee
- Department of Life Science, Chang Gung University, Tao-Yuan, Taiwan.
| | | | | |
Collapse
|
23
|
Sleumer MC, Mah AK, Baillie DL, Jones SJM. Conserved elements associated with ribosomal genes and their trans-splice acceptor sites in Caenorhabditis elegans. Nucleic Acids Res 2010; 38:2990-3004. [PMID: 20100800 PMCID: PMC2875031 DOI: 10.1093/nar/gkq003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The recent publication of the Caenorhabditis elegans cisRED database has provided an extensive catalog of upstream elements that are conserved between nematode genomes. We have performed a secondary analysis to determine which subsequences of the cisRED motifs are found in multiple locations throughout the C. elegans genome. We used the word-counting motif discovery algorithm DME to form the motifs into groups based on sequence similarity. We then examined the genes associated with each motif group using DAVID and Ontologizer to determine which groups are associated with genes that also have significant functional associations in the Gene Ontology and other gene annotation sources. Of the 3265 motif groups formed, 612 (19%) had significant functional associations with respect to GO terms. Eight of the first 20 motif groups based on frequent dodecamers among the cisRED motif sequences were specifically associated with ribosomal protein genes; two of these were similar to mouse EBP-45, rat HNF3-family and Drosophila Zeste transcription factor binding sites. Additionally, seven motif groups were extensions of the canonical C. elegans trans-splice acceptor site. One motif group was tested for regulatory function in a series of green fluorescent protein expression experiments and was shown to be involved in pharyngeal expression.
Collapse
Affiliation(s)
- Monica C Sleumer
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 W 7th Ave Suite 100, Vancouver, BC, Canada
| | | | | | | |
Collapse
|
24
|
Liu C, Oliveira A, Chauhan C, Ghedin E, Unnasch TR. Functional analysis of putative operons in Brugia malayi. Int J Parasitol 2010; 40:63-71. [PMID: 19631652 PMCID: PMC2813416 DOI: 10.1016/j.ijpara.2009.07.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2009] [Revised: 07/06/2009] [Accepted: 07/07/2009] [Indexed: 11/21/2022]
Abstract
Operons are a common mode of gene organization in Caenorhabditis elegans. Similar gene arrangements suggest that functional operons may exist in Brugia malayi. To definitively test this hypothesis, a bicistronic reporter vector consisting of an upstream firefly luciferase gene and a downstream renilla luciferase gene was constructed. The genome was then surveyed to identify 15 gene pairs that were likely to represent operons. Two of four domains upstream of the 5' gene from these clusters exhibited promoter activity. When constructs replicating the promoter and intergenic arrangement found in the native putative operon were transfected into embryos, both firefly and renilla activities were detected, while constructs with the promoter alone or intergenic region alone produced no activity from the downstream reporter. These data confirm that functional operons exist in B. malayi. Mutation of three U-rich element homologues present in one of the operons resulted in a decrease in downstream renilla reporter activity, suggesting that these were important in mRNA maturation. Hemi-nested reverse transcriptase-PCR assays demonstrated that while the mRNA encoding the native downstream open reading frame of one operon contained an SL1 spliced leader at its 5' end, the renilla gene mRNA produced from the corresponding transgenic construct did not.
Collapse
Affiliation(s)
- Canhui Liu
- Global Health Infectious Disease Research Program, Department of Global Health, University of South Florida, Tampa, FL USA
| | - Ana Oliveira
- Geographic Medicine, University of Alabama at Birmingham, Birmingham, AL USA
| | - Chitra Chauhan
- Global Health Infectious Disease Research Program, Department of Global Health, University of South Florida, Tampa, FL USA
| | - Elodie Ghedin
- Division of Infectious Diseases, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania USA
| | - Thomas R. Unnasch
- Global Health Infectious Disease Research Program, Department of Global Health, University of South Florida, Tampa, FL USA
| |
Collapse
|
25
|
|
26
|
Salehi-Ashtiani K, Lin C, Hao T, Shen Y, Szeto D, Yang X, Ghamsari L, Lee H, Fan C, Murray RR, Milstein S, Svrzikapa N, Cusick ME, Roth FP, Hill DE, Vidal M. Large-scale RACE approach for proactive experimental definition of C. elegans ORFeome. Genome Res 2009; 19:2334-42. [PMID: 19801531 DOI: 10.1101/gr.098640.109] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Although a highly accurate sequence of the Caenorhabditis elegans genome has been available for 10 years, the exact transcript structures of many of its protein-coding genes remain unsettled. Approximately two-thirds of the ORFeome has been verified reactively by amplifying and cloning computationally predicted transcript models; still a full third of the ORFeome remains experimentally unverified. To fully identify the protein-coding potential of the worm genome including transcripts that may not satisfy existing heuristics for gene prediction, we developed a computational and experimental platform adapting rapid amplification of cDNA ends (RACE) for large-scale structural transcript annotation. We interrogated 2000 unverified protein-coding genes using this platform. We obtained RACE data for approximately two-thirds of the examined transcripts and reconstructed ORF and transcript models for close to 1000 of these. We defined untranslated regions, identified new exons, and redefined previously annotated exons. Our results show that as much as 20% of the C. elegans genome may be incorrectly annotated. Many annotation errors could be corrected proactively with our large-scale RACE platform.
Collapse
Affiliation(s)
- Kourosh Salehi-Ashtiani
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Liu C, Chauhan C, Katholi CR, Unnasch TR. The splice leader addition domain represents an essential conserved motif for heterologous gene expression in B. malayi. Mol Biochem Parasitol 2009; 166:15-21. [PMID: 19428668 PMCID: PMC2680783 DOI: 10.1016/j.molbiopara.2009.02.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2008] [Revised: 02/10/2009] [Accepted: 02/11/2009] [Indexed: 11/28/2022]
Abstract
Two promoters from the human filarial parasite Brugia malayi have been mapped in detail. The essential domains of both promoters lacked canonical eukaryotic core promoter motifs. However, the largest contiguous essential domain in both promoters flanked and included the splice leader addition site. These findings suggested that the region flanking the trans-splicing addition site might represent a conserved core domain in B. malayi promoters. To test this hypothesis, the putative promoters of 12 trans-spliced genes encoding ribosomal protein homologues from B. malayi were isolated and tested for activity in a B. malayi transient transfection system. Of the 12 domains examined, 11 produced detectable reporter gene activity. Mutant constructs of the six most active promoters were prepared in which the spliced leader acceptor site and the 10 nt upstream and downstream of the site were deleted. All deletion constructs exhibited >90% reduction in reporter gene activity relative to their respective wild type sequences. A conserved pyrimidine-rich tract was located directly upstream from the spliced leader splice acceptor site which contained a conserved T residue located at position -3. Mutation of the entire polypyrimidine tract or the conserved T individually resulted in the loss of over 90% of reporter gene activity. In contrast, mutation of the splice acceptor site did not significantly reduce promoter activity. These data suggest that the region surrounding the splice acceptor site in the ribosomal promoters represents a conserved essential domain which functions independently of splice leader addition.
Collapse
Affiliation(s)
- Canhui Liu
- Global Health Infectious Disease Research, Department of Global Health, College of Public Health, University of South Florida, Tampa, FL
| | - Chitra Chauhan
- Global Health Infectious Disease Research, Department of Global Health, College of Public Health, University of South Florida, Tampa, FL
| | - Charles R. Katholi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, Al
| | - Thomas R. Unnasch
- Global Health Infectious Disease Research, Department of Global Health, College of Public Health, University of South Florida, Tampa, FL
| |
Collapse
|
28
|
Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, Philips P, De Bona F, Hartmann L, Bohlen A, Krüger N, Sonnenburg S, Rätsch G. mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res 2009; 19:2133-43. [PMID: 19564452 DOI: 10.1101/gr.090597.108] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We present a highly accurate gene-prediction system for eukaryotic genomes, called mGene. It combines in an unprecedented manner the flexibility of generalized hidden Markov models (gHMMs) with the predictive power of modern machine learning methods, such as Support Vector Machines (SVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. Considering the average of sensitivity and specificity, the developmental version of mGene exhibited the best prediction performance on nucleotide, exon, and transcript level for ab initio and multiple-genome gene-prediction tasks. The fully developed version shows superior performance in 10 out of 12 evaluation criteria compared with the other participating gene finders, including Fgenesh++ and Augustus. An in-depth analysis of mGene's genome-wide predictions revealed that approximately 2200 predicted genes were not contained in the current genome annotation. Testing a subset of 57 of these genes by RT-PCR and sequencing, we confirmed expression for 24 (42%) of them. mGene missed 300 annotated genes, out of which 205 were unconfirmed. RT-PCR testing of 24 of these genes resulted in a success rate of merely 8%. These findings suggest that even the gene catalog of a well-studied organism such as C. elegans can be substantially improved by mGene's predictions. We also provide gene predictions for the four nematodes C. briggsae, C. brenneri, C. japonica, and C. remanei. Comparing the resulting proteomes among these organisms and to the known protein universe, we identified many species-specific gene inventions. In a quality assessment of several available annotations for these genomes, we find that mGene's predictions are most accurate.
Collapse
|
29
|
Cascades of convergent evolution: the corresponding evolutionary histories of euglenozoans and dinoflagellates. Proc Natl Acad Sci U S A 2009; 106 Suppl 1:9963-70. [PMID: 19528647 DOI: 10.1073/pnas.0901004106] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The majority of eukaryotic diversity is hidden in protists, yet our current knowledge of processes and structures in the eukaryotic cell is almost exclusively derived from multicellular organisms. The increasing sensitivity of molecular methods and growing interest in microeukaryotes has only recently demonstrated that many features so far considered to be universal for eukaryotes actually exist in strikingly different versions. In other words, during their long evolutionary histories, protists have solved general biological problems in many more ways than previously appreciated. Interestingly, some groups have broken more rules than others, and the Euglenozoa and the Alveolata stand out in this respect. A review of the numerous odd features in these 2 groups allows us to draw attention to the high level of convergent evolution in protists, which perhaps reflects the limits that certain features can be altered. Moreover, the appearance of one deviation in an ancestor can constrain the set of possible downstream deviations in its descendents, so features that might be independent functionally, can still be evolutionarily linked. What functional advantage may be conferred by the excessive complexity of euglenozoan and alveolate gene expression, organellar genome structure, and RNA editing and processing has been thoroughly debated, but we suggest these are more likely the products of constructive neutral evolution, and as such do not necessarily confer any selective advantage at all.
Collapse
|
30
|
Hutchins LN, Murphy SM, Singh P, Graber JH. Position-dependent motif characterization using non-negative matrix factorization. ACTA ACUST UNITED AC 2008; 24:2684-90. [PMID: 18852176 PMCID: PMC2639279 DOI: 10.1093/bioinformatics/btn526] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Motivation:Cis-acting regulatory elements are frequently constrained by both sequence content and positioning relative to a functional site, such as a splice or polyadenylation site. We describe an approach to regulatory motif analysis based on non-negative matrix factorization (NMF). Whereas existing pattern recognition algorithms commonly focus primarily on sequence content, our method simultaneously characterizes both positioning and sequence content of putative motifs. Results: Tests on artificially generated sequences show that NMF can faithfully reproduce both positioning and content of test motifs. We show how the variation of the residual sum of squares can be used to give a robust estimate of the number of motifs or patterns in a sequence set. Our analysis distinguishes multiple motifs with significant overlap in sequence content and/or positioning. Finally, we demonstrate the use of the NMF approach through characterization of biologically interesting datasets. Specifically, an analysis of mRNA 3′-processing (cleavage and polyadenylation) sites from a broad range of higher eukaryotes reveals a conserved core pattern of three elements. Contact:joel.graber@jax.org Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lucie N Hutchins
- Center for Genome Dynamics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | | | | | | |
Collapse
|
31
|
Sonnenburg S, Zien A, Philips P, Rätsch G. POIMs: positional oligomer importance matrices--understanding support vector machine-based signal detectors. Bioinformatics 2008; 24:i6-14. [PMID: 18586746 PMCID: PMC2718648 DOI: 10.1093/bioinformatics/btn170] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION At the heart of many important bioinformatics problems, such as gene finding and function prediction, is the classification of biological sequences. Frequently the most accurate classifiers are obtained by training support vector machines (SVMs) with complex sequence kernels. However, a cumbersome shortcoming of SVMs is that their learned decision rules are very hard to understand for humans and cannot easily be related to biological facts. RESULTS To make SVM-based sequence classifiers more accessible and profitable, we introduce the concept of positional oligomer importance matrices (POIMs) and propose an efficient algorithm for their computation. In contrast to the raw SVM feature weighting, POIMs take the underlying correlation structure of k-mer features induced by overlaps of related k-mers into account. POIMs can be seen as a powerful generalization of sequence logos: they allow to capture and visualize sequence patterns that are relevant for the investigated biological phenomena. AVAILABILITY All source code, datasets, tables and figures are available at http://www.fml.tuebingen.mpg.de/raetsch/projects/POIM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sören Sonnenburg
- Fraunhofer Institute FIRST, Department IDA, Kekulèstr. 7, 12489 Berlin, Germany.
| | | | | | | |
Collapse
|