1
|
Prazsák I, Tombácz D, Fülöp Á, Torma G, Gulyás G, Dörmő Á, Kakuk B, McKenzie Spires L, Toth Z, Boldogkői Z. KSHV 3.0: a state-of-the-art annotation of the Kaposi's sarcoma-associated herpesvirus transcriptome using cross-platform sequencing. mSystems 2024; 9:e0100723. [PMID: 38206015 PMCID: PMC10878076 DOI: 10.1128/msystems.01007-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 12/11/2023] [Indexed: 01/12/2024] Open
Abstract
Kaposi's sarcoma-associated herpesvirus (KSHV) is a large, oncogenic DNA virus belonging to the gammaherpesvirus subfamily. KSHV has been extensively studied with various high-throughput RNA-sequencing approaches to map the transcription start and end sites, the splice junctions, and the translation initiation sites. Despite these efforts, the comprehensive annotation of the viral transcriptome remains incomplete. In the present study, we generated a long-read sequencing data set of the lytic and latent KSHV transcriptome using native RNA and direct cDNA-sequencing methods. This was supplemented with Cap Analysis of Gene Expression sequencing based on a short-read platform. We also utilized data sets from previous publications for our analysis. As a result of this combined approach, we have identified a number of novel viral transcripts and RNA isoforms and have either corroborated or improved the annotation of previously identified viral RNA molecules, thereby notably enhancing our comprehension of the transcriptomic architecture of the KSHV genome. We also evaluated the coding capability of transcripts previously thought to be non-coding by integrating our data on the viral transcripts with translatomic information from other publications.IMPORTANCEDeciphering the viral transcriptome of Kaposi's sarcoma-associated herpesvirus is of great importance because we can gain insight into the molecular mechanism of viral replication and pathogenesis, which can help develop potential targets for antiviral interventions. Specifically, the identification of substantial transcriptional overlaps by this work suggests the existence of a genome-wide interference between transcriptional machineries. This finding indicates the presence of a novel regulatory layer, potentially controlling the expression of viral genes.
Collapse
Affiliation(s)
- István Prazsák
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Dóra Tombácz
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Ádám Fülöp
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Gábor Torma
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Gábor Gulyás
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Ákos Dörmő
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Balázs Kakuk
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Lauren McKenzie Spires
- Department of Oral Biology, University of Florida College of Dentistry, Gainesville, Florida, USA
| | - Zsolt Toth
- Department of Oral Biology, University of Florida College of Dentistry, Gainesville, Florida, USA
| | - Zsolt Boldogkői
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| |
Collapse
|
2
|
Bubnova AN, Yakovleva IV, Korotkov EV, Kamionskaya AM. In Silico Verification of Predicted Potential Promoter Sequences in the Rice ( Oryza sativa) Genome. PLANTS (BASEL, SWITZERLAND) 2023; 12:3573. [PMID: 37896036 PMCID: PMC10609952 DOI: 10.3390/plants12203573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/09/2023] [Accepted: 10/12/2023] [Indexed: 10/29/2023]
Abstract
The exact identification of promoter sequences remains a serious problem in computational biology, as the promoter prediction algorithms under development continue to produce false-positive results. Therefore, to fully assess the validity of predicted sequences, it is necessary to perform a comprehensive test of their properties, such as the presence of downstream transcribed DNA regions behind them, or chromatin accessibility for transcription factor binding. In this paper, we examined the promoter sequences of chromosome 1 of the rice Oryza sativa genome from the Database of Potential Promoter Sequences predicted using a mathematical algorithm based on the derivation and calculation of statistically significant promoter classes. In this paper TATA motifs and cis-regulatory elements were identified in the predicted promoter sequences. We also verified the presence of potential transcription start sites near the predicted promoters by analyzing CAGE-seq data. We searched for unannotated transcripts behind the predicted sequences by de novo assembling transcripts from RNA-seq data. We also examined chromatin accessibility in the region of the predicted promoters by analyzing ATAC-seq data. As a result of this work, we identified the predicted sequences that are most likely to be promoters for further experimental validation in an in vivo or in vitro system.
Collapse
Affiliation(s)
- Anastasiya N. Bubnova
- Federal State Institution Federal Research Centre «Fundamentals of Biotechnology», Russian Academy of Sciences, 119071 Moscow, Russia (A.M.K.)
| | | | | | | |
Collapse
|
3
|
Oliveira DS, Fablet M, Larue A, Vallier A, Carareto CA, Rebollo R, Vieira C. ChimeraTE: a pipeline to detect chimeric transcripts derived from genes and transposable elements. Nucleic Acids Res 2023; 51:9764-9784. [PMID: 37615575 PMCID: PMC10570057 DOI: 10.1093/nar/gkad671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 07/25/2023] [Accepted: 08/09/2023] [Indexed: 08/25/2023] Open
Abstract
Transposable elements (TEs) produce structural variants and are considered an important source of genetic diversity. Notably, TE-gene fusion transcripts, i.e. chimeric transcripts, have been associated with adaptation in several species. However, the identification of these chimeras remains hindered due to the lack of detection tools at a transcriptome-wide scale, and to the reliance on a reference genome, even though different individuals/cells/strains have different TE insertions. Therefore, we developed ChimeraTE, a pipeline that uses paired-end RNA-seq reads to identify chimeric transcripts through two different modes. Mode 1 is the reference-guided approach that employs canonical genome alignment, and Mode 2 identifies chimeras derived from fixed or insertionally polymorphic TEs without any reference genome. We have validated both modes using RNA-seq data from four Drosophila melanogaster wild-type strains. We found ∼1.12% of all genes generating chimeric transcripts, most of them from TE-exonized sequences. Approximately ∼23% of all detected chimeras were absent from the reference genome, indicating that TEs belonging to chimeric transcripts may be recent, polymorphic insertions. ChimeraTE is the first pipeline able to automatically uncover chimeric transcripts without a reference genome, consisting of two running Modes that can be used as a tool to investigate the contribution of TEs to transcriptome plasticity.
Collapse
Affiliation(s)
- Daniel S Oliveira
- São Paulo State University (Unesp), Institute of Biosciences, Humanities and Exact Sciences, São José do Rio Preto, SP, Brazil
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR5558, Villeurbanne, Rhone-Alpes, 69100, France
| | - Marie Fablet
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR5558, Villeurbanne, Rhone-Alpes, 69100, France
- Institut Universitaire de France (IUF), Paris, Île-de-FranceF-75231, France
| | - Anaïs Larue
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR5558, Villeurbanne, Rhone-Alpes, 69100, France
- Univ Lyon, INRAE, INSA-Lyon, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Agnès Vallier
- Univ Lyon, INRAE, INSA-Lyon, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Claudia M A Carareto
- São Paulo State University (Unesp), Institute of Biosciences, Humanities and Exact Sciences, São José do Rio Preto, SP, Brazil
| | - Rita Rebollo
- Univ Lyon, INRAE, INSA-Lyon, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Cristina Vieira
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR5558, Villeurbanne, Rhone-Alpes, 69100, France
| |
Collapse
|
4
|
Prazsák I, Tombácz D, Fülöp Á, Torma G, Gulyás G, Dörmő Á, Kakuk B, Spires LM, Toth Z, Boldogkői Z. KSHV 3.0: A State-of-the-Art Annotation of the Kaposi's Sarcoma-Associated Herpesvirus Transcriptome Using Cross-Platform Sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.21.558842. [PMID: 37790386 PMCID: PMC10542539 DOI: 10.1101/2023.09.21.558842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Kaposi's sarcoma-associated herpesvirus (KSHV) is a large, oncogenic DNA virus belonging to the gammaherpesvirus subfamily. KSHV has been extensively studied with various high-throughput RNA-sequencing approaches to map the transcription start and end sites, the splice junctions, and the translation initiation sites. Despite these efforts, the comprehensive annotation of the viral transcriptome remains incomplete. In the present study, we generated a long-read sequencing dataset of the lytic and latent KSHV transcriptome using native RNA and direct cDNA sequencing methods. This was supplemented with CAGE sequencing based on a short-read platform. We also utilized datasets from previous publications for our analysis. As a result of this combined approach, we have identified a number of novel viral transcripts and RNA isoforms and have either corroborated or improved the annotation of previously identified viral RNA molecules, thereby notably enhancing our comprehension of the transcriptomic architecture of the KSHV genome. We also evaluated the coding capability of transcripts previously thought to be non-coding, by integrating our data on the viral transcripts with translatomic information from other publications.
Collapse
Affiliation(s)
- István Prazsák
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Dóra Tombácz
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Ádám Fülöp
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Gábor Torma
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Gábor Gulyás
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Ákos Dörmő
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Balázs Kakuk
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Lauren McKenzie Spires
- Department of Oral Biology, University of Florida College of Dentistry, Gainesville, Florida, USA
| | - Zsolt Toth
- Department of Oral Biology, University of Florida College of Dentistry, Gainesville, Florida, USA
| | - Zsolt Boldogkői
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| |
Collapse
|
5
|
Reese F, Williams B, Balderrama-Gutierrez G, Wyman D, Çelik MH, Rebboah E, Rezaie N, Trout D, Razavi-Mohseni M, Jiang Y, Borsari B, Morabito S, Liang HY, McGill CJ, Rahmanian S, Sakr J, Jiang S, Zeng W, Carvalho K, Weimer AK, Dionne LA, McShane A, Bedi K, Elhajjajy SI, Upchurch S, Jou J, Youngworth I, Gabdank I, Sud P, Jolanki O, Strattan JS, Kagda MS, Snyder MP, Hitz BC, Moore JE, Weng Z, Bennett D, Reinholdt L, Ljungman M, Beer MA, Gerstein MB, Pachter L, Guigó R, Wold BJ, Mortazavi A. The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.15.540865. [PMID: 37292896 PMCID: PMC10245583 DOI: 10.1101/2023.05.15.540865] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.
Collapse
Affiliation(s)
- Fairlie Reese
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Gabriela Balderrama-Gutierrez
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Dana Wyman
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Muhammed Hasan Çelik
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Elisabeth Rebboah
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Narges Rezaie
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Milad Razavi-Mohseni
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
| | - Beatrice Borsari
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Samuel Morabito
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Heidi Yahan Liang
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Cassandra J McGill
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Sorena Rahmanian
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Jasmine Sakr
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, USA
| | - Shan Jiang
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Weihua Zeng
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Klebea Carvalho
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Annika K Weimer
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Louise A Dionne
- The Jackson Laboratory, The Jackson Laboratory, Bar Harbor, USA
| | - Ariel McShane
- Cellular and Molecular Biology Program, University of Michigan, Ann Arbor, USA
- Department of Radiation Oncology, University of Michigan, Ann Arbor, USA
| | - Karan Bedi
- Department of Biostatistics, University of Michigan, Ann Arbor, USA
- Center for RNA Biomedicine and Rogel Cancer Center, University of Michigan, Ann Arbor, USA
| | - Shaimae I Elhajjajy
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - Sean Upchurch
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Jennifer Jou
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Ingrid Youngworth
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Paul Sud
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Ben C Hitz
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - David Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, USA
- Department of Neurological Sciences, Rush University Medical Center, Chicago, USA
| | - Laura Reinholdt
- The Jackson Laboratory, The Jackson Laboratory, Bar Harbor, USA
| | - Mats Ljungman
- Center for RNA Biomedicine and Rogel Cancer Center, University of Michigan, Ann Arbor, USA
- Departments of Radiation Oncology and Environmental Health Sciences, University of Michigan, Ann Arbor, USA
| | - Michael A Beer
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, USA
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
- Section on Biomedical Informatics and Data Science, Yale University, New Haven, USA
- Department of Statistics and Data Science, Yale University, New Haven, USA
- Department of Computer Science, Yale University, New Haven, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, USA
| | - Roderic Guigó
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Barbara J Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Ali Mortazavi
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| |
Collapse
|
6
|
Murray A, Vollmers C, Schmitz RJ. Smar2C2: A Simple and Efficient Protocol for the Identification of Transcription Start Sites. Curr Protoc 2023; 3:e705. [PMID: 36947693 DOI: 10.1002/cpz1.705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023]
Abstract
Promoters and the noncoding sequences that drive their function are fundamental aspects of genes that are critical to their regulation. The transcription preinitiation complex binds and assembles on promoters where it facilitates transcription. The transcription start site (TSS) is located downstream of the promoter sequence and is defined as the location in the genome where polymerase begins transcribing DNA into RNA. Knowing the location of TSSs is useful for annotation of genes, identification of non-coding sequences important to gene regulation, detection of alternative TSSs, and understanding of 5' UTR content. Several existing techniques make it possible to accurately identify TSSs, but are often difficult to perform experimentally, require large amounts of input RNA, or are unable to identify a large number of TSSs from a single sample. Many of these protocols take advantage of template switching reverse transcriptases (TSRTs), which reliably place an adaptor at the 5' end of a first strand synthesis of cDNA. Here, we introduce a protocol that exploits TSRT activity combined with rolling circle amplification to identify TSSs with several unique advantages over existing methods. Sequence adaptors are placed on the 5' and 3' end of the full-length cDNA copy of a transcript. A splint compatible with those adaptors is then used to circularize the full-length cDNA. Linear DNA containing concatemers of the cDNA are generated using rolling circle amplification, and a sequencing library is formed by fragmenting the concatemers. This protocol is straightforward to execute, requiring limited bench time with relatively stable reagents. Using extremely low amounts of RNA input, this protocol produces large numbers of accurate, deduplicated TSSs genome wide. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Splint generation Basic Protocol 2: RNA extraction Basic Protocol 3: cDNA synthesis Basic Protocol 4: cDNA circularization and amplification Basic Protocol 5: Library generation.
Collapse
Affiliation(s)
- Andrew Murray
- Department of Plant Biology, University of Georgia, Athens, Georgia
| | - Christopher Vollmers
- Deparment of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California
| | | |
Collapse
|
7
|
Gavgani HN, Grotewold E, Gray J. Methodology for Constructing a Knowledgebase for Plant Gene Regulation Information. Methods Mol Biol 2023; 2698:277-300. [PMID: 37682481 DOI: 10.1007/978-1-0716-3354-0_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
The amount of biological data is growing at a rapid pace as many high-throughput omics technologies and data pipelines are developed. This is resulting in the growth of databases for DNA and protein sequences, gene expression, protein accumulation, structural, and localization information. The diversity and multi-omics nature of such bioinformatic data requires well-designed databases for flexible organization and presentation. Besides general-purpose online bioinformatic databases, users need narrowly focused online databases to quickly access a meaningful collection of related data for their research. Here, we describe the methodology used to implement a plant gene regulatory knowledgebase, with data, query, and tool features, as well as the ability to expand to accommodate future datasets. We exemplify this methodology for the GRASSIUS knowledgebase, but it is applicable to developing and updating similar plant gene regulatory knowledgebases. GRASSIUS organizes and presents gene regulatory data from grass species with a central focus on maize (Zea mays). The main class of data presented include not only the families of transcription factors (TFs) and co-regulators (CRs) but also protein-DNA interaction data, where available.
Collapse
Affiliation(s)
- Hadi Nayebi Gavgani
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
- Dandelions Therapeutics Inc., San Francisco, CA, USA
| | - Erich Grotewold
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| | - John Gray
- Department of Biological Sciences, University of Toledo, Toledo, OH, USA.
| |
Collapse
|
8
|
Xu J, Pratt HE, Moore JE, Gerstein MB, Weng Z. Building integrative functional maps of gene regulation. Hum Mol Genet 2022; 31:R114-R122. [PMID: 36083269 PMCID: PMC9585680 DOI: 10.1093/hmg/ddac195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 08/03/2022] [Accepted: 08/09/2022] [Indexed: 11/13/2022] Open
Abstract
Every cell in the human body inherits a copy of the same genetic information. The three billion base pairs of DNA in the human genome, and the roughly 50 000 coding and non-coding genes they contain, must thus encode all the complexity of human development and cell and tissue type diversity. Differences in gene regulation, or the modulation of gene expression, enable individual cells to interpret the genome differently to carry out their specific functions. Here we discuss recent and ongoing efforts to build gene regulatory maps, which aim to characterize the regulatory roles of all sequences in a genome. Many researchers and consortia have identified such regulatory elements using functional assays and evolutionary analyses; we discuss the results, strengths and shortcomings of their approaches. We also discuss new techniques the field can leverage and emerging challenges it will face while striving to build gene regulatory maps of ever-increasing resolution and comprehensiveness.
Collapse
Affiliation(s)
- Jinrui Xu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Henry E Pratt
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| |
Collapse
|
9
|
Grayeski PJ, Weidmann CA, Kumar J, Lackey L, Mustoe A, Busan S, Laederach A, Weeks KM. Global 5'-UTR RNA structure regulates translation of a SERPINA1 mRNA. Nucleic Acids Res 2022; 50:9689-9704. [PMID: 36107773 PMCID: PMC9508835 DOI: 10.1093/nar/gkac739] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/13/2022] Open
Abstract
SERPINA1 mRNAs encode the protease inhibitor α-1-antitrypsin and are regulated through post-transcriptional mechanisms. α-1-antitrypsin deficiency leads to chronic obstructive pulmonary disease (COPD) and liver cirrhosis, and specific variants in the 5'-untranslated region (5'-UTR) are associated with COPD. The NM_000295.4 transcript is well expressed and translated in lung and blood and features an extended 5'-UTR that does not contain a competing upstream open reading frame (uORF). We show that the 5'-UTR of NM_000295.4 folds into a well-defined multi-helix structural domain. We systematically destabilized mRNA structure across the NM_000295.4 5'-UTR, and measured changes in (SHAPE quantified) RNA structure and cap-dependent translation relative to a native-sequence reporter. Surprisingly, despite destabilizing local RNA structure, most mutations either had no effect on or decreased translation. Most structure-destabilizing mutations retained native, global 5'-UTR structure. However, those mutations that disrupted the helix that anchors the 5'-UTR domain yielded three groups of non-native structures. Two of these non-native structure groups refolded to create a stable helix near the translation initiation site that decreases translation. Thus, in contrast to the conventional model that RNA structure in 5'-UTRs primarily inhibits translation, complex folding of the NM_000295.4 5'-UTR creates a translation-optimized message by promoting accessibility at the translation initiation site.
Collapse
Affiliation(s)
- Philip J Grayeski
- Department of Chemistry, University of North Carolina, Chapel Hill, NC 27599-3290, USA
| | - Chase A Weidmann
- Department of Biological Chemistry, Center for RNA Biomedicine, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Jayashree Kumar
- Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Lela Lackey
- Department of Genetics and Biochemistry, Center for Human Genetics, Clemson University, Greenwood, SC 29646, USA
| | - Anthony M Mustoe
- Verna and Marrs McClean Department of Biochemistry and Molecular Biology, Department of Molecular and Human Genetics, and Therapeutic Innovation Center (THINC), Baylor College of Medicine, Houston, TX 77030, USA
| | - Steven Busan
- Department of Chemistry, University of North Carolina, Chapel Hill, NC 27599-3290, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Kevin M Weeks
- Department of Chemistry, University of North Carolina, Chapel Hill, NC 27599-3290, USA
| |
Collapse
|
10
|
Baar T, Dümcke S, Gressel S, Schwalb B, Dilthey A, Cramer P, Tresch A. RNA transcription and degradation of Alu retrotransposons depends on sequence features and evolutionary history. G3 GENES|GENOMES|GENETICS 2022; 12:6543614. [PMID: 35253846 PMCID: PMC9073682 DOI: 10.1093/g3journal/jkac054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 02/25/2022] [Indexed: 11/16/2022]
Abstract
Alu elements are one of the most successful groups of RNA retrotransposons and make up 11% of the human genome with over 1 million individual loci. They are linked to genetic defects, increases in sequence diversity, and influence transcriptional activity. Still, their RNA metabolism is poorly understood yet. It is even unclear whether Alu elements are mostly transcribed by RNA Polymerase II or III. We have conducted a transcription shutoff experiment by α-amanitin and metabolic RNA labeling by 4-thiouridine combined with RNA fragmentation (TT-seq) and RNA-seq to shed further light on the origin and life cycle of Alu transcripts. We find that Alu RNAs are more stable than previously thought and seem to originate in part from RNA Polymerase II activity, as previous reports suggest. Their expression however seems to be independent of the transcriptional activity of adjacent genes. Furthermore, we have developed a novel statistical test for detecting the expression of quantitative trait loci in Alu elements that relies on the de Bruijn graph representation of all Alu sequences. It controls for both statistical significance and biological relevance using a tuned k-mer representation, discovering influential sequence features missed by regular motif search. In addition, we discover several point mutations using a generalized linear model, and motifs of interest, which also match transcription factor-binding motifs.
Collapse
Affiliation(s)
- Till Baar
- Institute of Medical Statistics and Computational Biology, Faculty of Medicine, University of Cologne, Cologne 50937, Germany
| | | | - Saskia Gressel
- Department of Molecular Biology, Max Planck Institute for Biophysical Chemistry, Göttingen 37077, Germany
| | - Björn Schwalb
- Department of Molecular Biology, Max Planck Institute for Biophysical Chemistry, Göttingen 37077, Germany
| | - Alexander Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf 40225, Germany
| | - Patrick Cramer
- Department of Molecular Biology, Max Planck Institute for Biophysical Chemistry, Göttingen 37077, Germany
| | - Achim Tresch
- Institute of Medical Statistics and Computational Biology, Faculty of Medicine, University of Cologne, Cologne 50937, Germany
- CECAD, University of Cologne, Cologne 50931, Germany
- Center for Data and Simulation Science, University of Cologne, Cologne 50923, Germany
| |
Collapse
|
11
|
Bajar BT, Phi NT, Isaacman-Beck J, Reichl J, Randhawa H, Akin O. A discrete neuronal population coordinates brain-wide developmental activity. Nature 2022; 602:639-646. [PMID: 35140397 DOI: 10.1038/s41586-022-04406-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 01/05/2022] [Indexed: 11/09/2022]
Abstract
In vertebrates, stimulus-independent activity accompanies neural circuit maturation throughout the developing brain1,2. The recent discovery of similar activity in the developing Drosophila central nervous system suggests that developmental activity is fundamental to the assembly of complex brains3. How such activity is coordinated across disparate brain regions to influence synaptic development at the level of defined cell types is not well understood. Here we show that neurons expressing the cation channel transient receptor potential gamma (Trpγ) relay and pattern developmental activity throughout the Drosophila brain. In trpγ mutants, activity is attenuated globally, and both patterns of activity and synapse structure are altered in a cell-type-specific manner. Less than 2% of the neurons in the brain express Trpγ. These neurons arborize throughout the brain, and silencing or activating them leads to loss or gain of brain-wide activity. Together, these results indicate that this small population of neurons coordinates brain-wide developmental activity. We propose that stereotyped patterns of developmental activity are driven by a discrete, genetically specified network to instruct neural circuit assembly at the level of individual cells and synapses. This work establishes the fly brain as an experimentally tractable system for studying how activity contributes to synapse and circuit formation.
Collapse
Affiliation(s)
- Bryce T Bajar
- Department of Biological Chemistry, Medical Scientist Training Program, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, CA, USA
| | - Nguyen T Phi
- Molecular, Cellular, and Integrative Physiology Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jesse Isaacman-Beck
- Department of Neurobiology, Stanford University School of Medicine, Stanford University, Stanford, CA, USA
| | - Jun Reichl
- Department of Neurobiology, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, CA, USA
| | - Harpreet Randhawa
- Department of Neurobiology, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, CA, USA
| | - Orkun Akin
- Department of Neurobiology, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
12
|
Zhang M, Jia C, Li F, Li C, Zhu Y, Akutsu T, Webb GI, Zou Q, Coin LJM, Song J. Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction. Brief Bioinform 2022; 23:6502561. [PMID: 35021193 PMCID: PMC8921625 DOI: 10.1093/bib/bbab551] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/12/2021] [Accepted: 11/30/2021] [Indexed: 01/13/2023] Open
Abstract
Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a significant challenge to accurately identify species-specific promoter sequences using computational approaches. To advance computational support for promoter prediction, in this study, we curated 58 comprehensive, up-to-date, benchmark datasets for 7 different species (i.e. Escherichia coli, Bacillus subtilis, Homo sapiens, Mus musculus, Arabidopsis thaliana, Zea mays and Drosophila melanogaster) to assist the research community to assess the relative functionality of alternative approaches and support future research on both prokaryotic and eukaryotic promoters. We revisited 106 predictors published since 2000 for promoter identification (40 for prokaryotic promoter, 61 for eukaryotic promoter, and 5 for both). We systematically evaluated their training datasets, computational methodologies, calculated features, performance and software usability. On the basis of these benchmark datasets, we benchmarked 19 predictors with functioning webservers/local tools and assessed their prediction performance. We found that deep learning and traditional machine learning-based approaches generally outperformed scoring function-based approaches. Taken together, the curated benchmark dataset repository and the benchmarking analysis in this study serve to inform the design and implementation of computational approaches for promoter prediction and facilitate more rigorous comparison of new techniques in the future.
Collapse
Affiliation(s)
| | - Cangzhi Jia
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | | | | | | | | | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Monash University, Melbourne, VIC 3800, Australia,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Quan Zou
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Lachlan J M Coin
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Jiangning Song
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| |
Collapse
|
13
|
Lu Z, Berry K, Hu Z, Zhan Y, Ahn TH, Lin Z. TSSr: an R package for comprehensive analyses of TSS sequencing data. NAR Genom Bioinform 2021; 3:lqab108. [PMID: 34805991 PMCID: PMC8598296 DOI: 10.1093/nargab/lqab108] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 10/05/2021] [Accepted: 10/27/2021] [Indexed: 12/13/2022] Open
Abstract
Transcription initiation is regulated in a highly organized fashion to ensure proper cellular functions. Accurate identification of transcription start sites (TSSs) and quantitative characterization of transcription initiation activities are fundamental steps for studies of regulated transcriptions and core promoter structures. Several high-throughput techniques have been developed to sequence the very 5'end of RNA transcripts (TSS sequencing) on the genome scale. Bioinformatics tools are essential for processing, analysis, and visualization of TSS sequencing data. Here, we present TSSr, an R package that provides rich functions for mapping TSS and characterizations of structures and activities of core promoters based on all types of TSS sequencing data. Specifically, TSSr implements several newly developed algorithms for accurately identifying TSSs from mapped sequencing reads and inference of core promoters, which are a prerequisite for subsequent functional analyses of TSS data. Furthermore, TSSr also enables users to export various types of TSS data that can be visualized by genome browser for inspection of promoter activities in association with other genomic features, and to generate publication-ready TSS graphs. These user-friendly features could greatly facilitate studies of transcription initiation based on TSS sequencing data. The source code and detailed documentations of TSSr can be freely accessed at https://github.com/Linlab-slu/TSSr.
Collapse
Affiliation(s)
- Zhaolian Lu
- Department of Biology, Saint Louis University, St. Louis, MO 63103, USA
| | - Keenan Berry
- Program of Bioinformatics and Computational Biology, Saint Louis University, St. Louis, MO 63103, USA
| | - Zhenbin Hu
- Department of Biology, Saint Louis University, St. Louis, MO 63103, USA
| | - Yu Zhan
- Department of Biology, Saint Louis University, St. Louis, MO 63103, USA
| | - Tae-Hyuk Ahn
- Program of Bioinformatics and Computational Biology, Saint Louis University, St. Louis, MO 63103, USA
| | - Zhenguo Lin
- Department of Biology, Saint Louis University, St. Louis, MO 63103, USA
| |
Collapse
|
14
|
Application of pan genomics towards the druggability of Clostridium botulinum. APPLIED NANOSCIENCE 2021. [DOI: 10.1007/s13204-021-02005-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
15
|
Tissue-specific expression of p73 and p63 isoforms in human tissues. Cell Death Dis 2021; 12:745. [PMID: 34315849 PMCID: PMC8316356 DOI: 10.1038/s41419-021-04017-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 06/25/2021] [Accepted: 06/30/2021] [Indexed: 12/13/2022]
Abstract
p73 and p63 are members of the p53 family that exhibit overlapping and distinct functions in development and homeostasis. The evaluation of p73 and p63 isoform expression across human tissue can provide greater insight to the functional interactions between family members. We determined the mRNA isoform expression patterns of TP73 and TP63 across a panel of 36 human tissues and protein expression within the highest-expressing tissues. TP73 and TP63 expression significantly correlated across tissues. In tissues with concurrent mRNA expression, nuclear co-expression of both proteins was observed in a majority of cells. Using GTEx data, we quantified p73 and p63 isoform expression in human tissue and identified that the α-isoforms of TP73 and TP63 were the predominant isoform expressed in nearly all tissues. Further, we identified a previously unreported p73 mRNA product encoded by exons 4 to 14. In sum, these data provide the most comprehensive tissue-specific atlas of p73 and p63 protein and mRNA expression patterns in human and murine samples, indicating coordinate expression of these transcription factors in the majority of tissues in which they are expressed.
Collapse
|
16
|
melRNA-seq for Expression Analysis of SINE RNAs and Other Medium-Length Non-Coding RNAs. Mob DNA 2021; 12:15. [PMID: 34134767 PMCID: PMC8210359 DOI: 10.1186/s13100-021-00245-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 06/04/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Small interspersed elements (SINEs) are transcribed by RNA polymerase III (Pol III) to produce RNAs typically 100-500 nucleotides in length. Although their RNA abundance can be evaluated by Northern blotting and primer extension, the nature (sequence, exact length, and genomic origin) of these RNAs cannot be revealed by these methods. Moreover, mRNA sequencing (mRNA-seq) is not able to distinguish bona fide SINE RNAs or SINE sequences present in longer transcripts. RESULTS To elucidate the abundance, source loci, and sequence nature of SINE RNAs, we established a deep sequencing method, designated as melRNA-seq (medium-length RNA-seq), which can determine whole-length RNA sequences. Total RNA samples were treated with 5' pyrophosphohydrolase (RppH), which allowed ligation of an RNA adaptor to the 5' end of intact SINE RNAs. Similarly, another adaptor was ligated to the 3' end, followed by reverse transcription, PCR amplification, size selection, and single-end deep sequencing. The analysis of two biological replicates of RNAs from mouse spermatogonia showed high reproducibility of SINE expression data both at family and locus levels. CONCLUSIONS This new method can be used for quantification and detailed sequence analysis of medium-length non-coding RNAs, such as rRNA, snRNA, tRNAs, and SINE RNAs. Further, its dynamic range is much wider than Northern blotting and primer extension.
Collapse
|
17
|
A novel piperazine derivative that targets hepatitis B surface antigen effectively inhibits tenofovir resistant hepatitis B virus. Sci Rep 2021; 11:11723. [PMID: 34083665 PMCID: PMC8175705 DOI: 10.1038/s41598-021-91196-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 05/19/2021] [Indexed: 02/04/2023] Open
Abstract
Chronic hepatitis B virus (HBV) infection is a global problem. The loss of hepatitis B surface antigen (HBsAg) in serum is a therapeutic end point. Prolonged therapy with nucleoside/nucleotide analogues targeting the HBV-polymerase may lead to resistance and rarely results in the loss of HBsAg. Therefore, inhibitors targeting HBsAg may have potential therapeutic applications. Here, we used computational virtual screening, docking, and molecular dynamics simulations to identify potential small molecule inhibitors against HBsAg. After screening a million molecules from ZINC database, we identified small molecules with potential anti-HBV activity. Subsequently, cytotoxicity profiles and anti-HBV activities of these small molecules were tested using a widely used cell culture model for HBV. We identified a small molecule (ZINC20451377) which binds to HBsAg with high affinity, with a KD of 65.3 nM, as determined by Surface Plasmon Resonance spectroscopy. Notably, the small molecule inhibited HBsAg production and hepatitis B virion secretion (10 μM) at low micromolar concentrations and was also efficacious against a HBV quadruple mutant (CYEI mutant) resistant to tenofovir. We conclude that this small molecule exhibits strong anti-HBV properties and merits further testing.
Collapse
|
18
|
Scientists on a RAMPAGE to find apicomplexan transcription start sites. Nat Rev Microbiol 2021; 19:483. [PMID: 34083794 DOI: 10.1038/s41579-021-00587-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/26/2021] [Indexed: 11/09/2022]
|
19
|
Sun W, Modica S, Dong H, Wolfrum C. Plasticity and heterogeneity of thermogenic adipose tissue. Nat Metab 2021; 3:751-761. [PMID: 34158657 DOI: 10.1038/s42255-021-00417-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 05/19/2021] [Indexed: 12/13/2022]
Abstract
The perception of adipose tissue, both in the scientific community and in the general population, has changed dramatically in the past 20 years. While adipose tissue was thought for a long time to be a rather simple lipid storage entity, it is now recognized as a highly heterogeneous organ and a critical regulator of systemic metabolism, composed of many different subtypes of cells, with important endocrine functions. Additionally, adipose tissue is nowadays recognized to contribute to energy turnover, due to the presence of specialized thermogenic adipocytes, which can be found in many adipose depots. This review discusses the unprecedented insights that we have gained into the heterogeneity of thermogenic adipocytes and their respective precursors due to the technical developments in single-cell and nucleus technologies. These methodological advances have increased our understanding of how adipose tissue catabolic function is influenced by developmental and intercellular communication events.
Collapse
Affiliation(s)
- Wenfei Sun
- Institute of Food, Nutrition and Health, ETH Zurich, Schwerzenbach, Switzerland
| | - Salvatore Modica
- Institute of Food, Nutrition and Health, ETH Zurich, Schwerzenbach, Switzerland
| | - Hua Dong
- Institute of Food, Nutrition and Health, ETH Zurich, Schwerzenbach, Switzerland
| | - Christian Wolfrum
- Institute of Food, Nutrition and Health, ETH Zurich, Schwerzenbach, Switzerland.
| |
Collapse
|
20
|
Luo D, Huguet-Tapia JC, Raborn RT, White FF, Brendel VP, Yang B. The Xa7 resistance gene guards the rice susceptibility gene SWEET14 against exploitation by the bacterial blight pathogen. PLANT COMMUNICATIONS 2021; 2:100164. [PMID: 34027391 PMCID: PMC8132128 DOI: 10.1016/j.xplc.2021.100164] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 01/14/2021] [Accepted: 01/15/2021] [Indexed: 05/03/2023]
Abstract
Many plant disease resistance (R) genes function specifically in reaction to the presence of cognate effectors from a pathogen. Xanthomonas oryzae pathovar oryzae (Xoo) uses transcription activator-like effectors (TALes) to target specific rice genes for expression, thereby promoting host susceptibility to bacterial blight. Here, we report the molecular characterization of Xa7, the cognate R gene to the TALes AvrXa7 and PthXo3, which target the rice major susceptibility gene SWEET14. Xa7 was mapped to a unique 74-kb region. Gene expression analysis of the region revealed a candidate gene that contained a putative AvrXa7 effector binding element (EBE) in its promoter and encoded a 113-amino-acid peptide of unknown function. Genome editing at the Xa7 locus rendered the plants susceptible to avrXa7-carrying Xoo strains. Both AvrXa7 and PthXo3 activated a GUS reporter gene fused with the EBE-containing Xa7 promoter in Nicotiana benthamiana. The EBE of Xa7 is a close mimic of the EBE of SWEET14 for TALe-induced disease susceptibility. Ectopic expression of Xa7 triggers cell death in N. benthamiana. Xa7 is prevalent in indica rice accessions from 3000 rice genomes. Xa7 appears to be an adaptation that protects against pathogen exploitation of SWEET14 and disease susceptibility.
Collapse
Affiliation(s)
- Dangping Luo
- Division of Plant Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Jose C. Huguet-Tapia
- Department of Plant Pathology, University of Florida, Gainesville, FL 32611, USA
| | - R. Taylor Raborn
- Department of Biology, Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
- Current address: Biodesign Institute Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85281, USA
| | - Frank F. White
- Department of Plant Pathology, University of Florida, Gainesville, FL 32611, USA
| | - Volker P. Brendel
- Department of Biology, Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Bing Yang
- Division of Plant Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- Donald Danforth Plant Science Center, St. Louis, MO 63132, USA
- Corresponding author
| |
Collapse
|
21
|
Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM. FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences. BMC Bioinformatics 2021; 22:205. [PMID: 33879057 PMCID: PMC8056616 DOI: 10.1186/s12859-021-04120-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 04/07/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Gene annotation in eukaryotes is a non-trivial task that requires meticulous analysis of accumulated transcript data. Challenges include transcriptionally active regions of the genome that contain overlapping genes, genes that produce numerous transcripts, transposable elements and numerous diverse sequence repeats. Currently available gene annotation software applications depend on pre-constructed full-length gene sequence assemblies which are not guaranteed to be error-free. The origins of these sequences are often uncertain, making it difficult to identify and rectify errors in them. This hinders the creation of an accurate and holistic representation of the transcriptomic landscape across multiple tissue types and experimental conditions. Therefore, to gauge the extent of diversity in gene structures, a comprehensive analysis of genome-wide expression data is imperative. RESULTS We present FINDER, a fully automated computational tool that optimizes the entire process of annotating genes and transcript structures. Unlike current state-of-the-art pipelines, FINDER automates the RNA-Seq pre-processing step by working directly with raw sequence reads and optimizes gene prediction from BRAKER2 by supplementing these reads with associated proteins. The FINDER pipeline (1) reports transcripts and recognizes genes that are expressed under specific conditions, (2) generates all possible alternatively spliced transcripts from expressed RNA-Seq data, (3) analyzes read coverage patterns to modify existing transcript models and create new ones, and (4) scores genes as high- or low-confidence based on the available evidence across multiple datasets. We demonstrate the ability of FINDER to automatically annotate a diverse pool of genomes from eight species. CONCLUSIONS FINDER takes a completely automated approach to annotate genes directly from raw expression data. It is capable of processing eukaryotic genomes of all sizes and requires no manual supervision-ideal for bench researchers with limited experience in handling computational tools.
Collapse
Affiliation(s)
- Sagnik Banerjee
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
- Department of Statistics, Iowa State University, Ames, IA, 50011, USA
| | - Priyanka Bhandary
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
- Department of Genetics, Developmental and Cell Biology, Iowa State University, Ames, IA, 50011, USA
| | - Margaret Woodhouse
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA
| | - Taner Z Sen
- Crop Improvement and Genetics Research Unit, USDA-Agricultural Research Service, Albany, CA, 94710, USA
| | - Roger P Wise
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, IA, 50011, USA
| | - Carson M Andorf
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA.
- Department of Computer Science, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
22
|
Zhang XO, Pratt H, Weng Z. Investigating the Potential Roles of SINEs in the Human Genome. Annu Rev Genomics Hum Genet 2021; 22:199-218. [PMID: 33792357 DOI: 10.1146/annurev-genom-111620-100736] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Short interspersed nuclear elements (SINEs) are nonautonomous retrotransposons that occupy approximately 13% of the human genome. They are transcribed by RNA polymerase III and can be retrotranscribed and inserted back into the genome with the help of other autonomous retroelements. Because they are preferentially located close to or within gene-rich regions, they can regulate gene expression by various mechanisms that act at both the DNA and the RNA levels. In this review, we summarize recent findings on the involvement of SINEs in different types of gene regulation and discuss the potential regulatory functions of SINEs that are in close proximity to genes, Pol III-transcribed SINE RNAs, and embedded SINE sequences within Pol II-transcribed genes in the human genome. These discoveries illustrate how the human genome has exapted some SINEs into functional regulatory elements.
Collapse
Affiliation(s)
- Xiao-Ou Zhang
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA; .,Current affiliation: School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Henry Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA;
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA;
| |
Collapse
|
23
|
Goszczynski DE, Halstead MM, Islas-Trejo AD, Zhou H, Ross PJ. Transcription initiation mapping in 31 bovine tissues reveals complex promoter activity, pervasive transcription, and tissue-specific promoter usage. Genome Res 2021; 31:732-744. [PMID: 33722934 PMCID: PMC8015843 DOI: 10.1101/gr.267336.120] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 02/01/2021] [Indexed: 01/04/2023]
Abstract
Characterizing transcription start sites is essential for understanding the regulatory mechanisms that control gene expression. Recently, a new bovine genome assembly (ARS-UCD1.2) with high continuity, accuracy, and completeness was released; however, the functional annotation of the bovine genome lacks precise transcription start sites and contains a low number of transcripts in comparison to human and mouse. By using the RAMPAGE approach, this study identified transcription start sites at high resolution in a large collection of bovine tissues. We found several known and novel transcription start sites attributed to promoters of protein-coding and lncRNA genes that were validated through experimental and in silico evidence. With these findings, the annotation of transcription start sites in cattle reached a level comparable to the mouse and human genome annotations. In addition, we identified and characterized transcription start sites for antisense transcripts derived from bidirectional promoters, potential lncRNAs, mRNAs, and pre-miRNAs. We also analyzed the quantitative aspects of RAMPAGE to produce a promoter activity atlas, reaching highly reproducible results comparable to traditional RNA-seq. Coexpression networks revealed considerable use of tissue-specific promoters, especially between brain and testicle, which expressed several genes in common from alternate loci. Furthermore, regions surrounding coexpressed modules were enriched in binding factor motifs representative of each tissue. The comprehensive annotation of promoters in such a large collection of tissues will substantially contribute to our understanding of gene expression in cattle and other mammalian species, shortening the gap between genotypes and phenotypes.
Collapse
Affiliation(s)
- Daniel E Goszczynski
- Department of Animal Science, University of California, Davis, California 95616, USA
| | - Michelle M Halstead
- Department of Animal Science, University of California, Davis, California 95616, USA
| | - Alma D Islas-Trejo
- Department of Animal Science, University of California, Davis, California 95616, USA
| | - Huaijun Zhou
- Department of Animal Science, University of California, Davis, California 95616, USA
| | - Pablo J Ross
- Department of Animal Science, University of California, Davis, California 95616, USA
| |
Collapse
|
24
|
Qureshi NA, Bakhtiar SM, Faheem M, Shah M, Bari A, Mahmood HM, Sohaib M, Mothana RA, Ullah R, Jamal SB. Genome-Based Drug Target Identification in Human Pathogen Streptococcus gallolyticus. Front Genet 2021; 12:564056. [PMID: 33841489 PMCID: PMC8027347 DOI: 10.3389/fgene.2021.564056] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 02/16/2021] [Indexed: 12/21/2022] Open
Abstract
Streptococcus gallolysticus (Sg) is an opportunistic Gram-positive, non-motile bacterium, which causes infective endocarditis, an inflammation of the inner lining of the heart. As Sg has acquired resistance with the available antibiotics, therefore, there is a dire need to find new therapeutic targets and potent drugs to prevent and treat this disease. In the current study, an in silico approach is utilized to link genomic data of Sg species with its proteome to identify putative therapeutic targets. A total of 1,138 core proteins have been identified using pan genomic approach. Further, using subtractive proteomic analysis, a set of 18 proteins, essential for bacteria and non-homologous to host (human), is identified. Out of these 18 proteins, 12 cytoplasmic proteins were selected as potential drug targets. These selected proteins were subjected to molecular docking against drug-like compounds retrieved from ZINC database. Furthermore, the top docked compounds with lower binding energy were identified. In this work, we have identified novel drug and vaccine targets against Sg, of which some have already been reported and validated in other species. Owing to the experimental validation, we believe our methodology and result are significant contribution for drug/vaccine target identification against Sg-caused infective endocarditis.
Collapse
Affiliation(s)
- Nosheen Afzal Qureshi
- Department of Bioinformatics and Biosciences, Capital University of Science and Technology, Islamabad, Pakistan
| | - Syeda Marriam Bakhtiar
- Department of Bioinformatics and Biosciences, Capital University of Science and Technology, Islamabad, Pakistan
| | - Muhammad Faheem
- Department of Biological Sciences, National University of Medical Sciences, Rawalpindi, Pakistan
| | - Mohibullah Shah
- Department of Biochemistry, Bahauddin Zakariya University, Multan, Pakistan
| | - Ahmed Bari
- Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Hafiz M Mahmood
- Department of Pharmacology, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Muhammad Sohaib
- Department of Soil Science, College of Food and Agriculture Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Ramzi A Mothana
- Department of Pharmacognosy (MAPPRC), College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Riaz Ullah
- Department of Pharmacognosy (MAPPRC), College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Syed Babar Jamal
- Department of Biological Sciences, National University of Medical Sciences, Rawalpindi, Pakistan
| |
Collapse
|
25
|
Markus BM, Waldman BS, Lorenzi HA, Lourido S. High-Resolution Mapping of Transcription Initiation in the Asexual Stages of Toxoplasma gondii. Front Cell Infect Microbiol 2021; 10:617998. [PMID: 33553008 PMCID: PMC7854901 DOI: 10.3389/fcimb.2020.617998] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 12/03/2020] [Indexed: 12/13/2022] Open
Abstract
Toxoplasma gondii is a common parasite of humans and animals, causing life-threatening disease in the immunocompromized, fetal abnormalities when contracted during gestation, and recurrent ocular lesions in some patients. Central to the prevalence and pathogenicity of this protozoan is its ability to adapt to a broad range of environments, and to differentiate between acute and chronic stages. These processes are underpinned by a major rewiring of gene expression, yet the mechanisms that regulate transcription in this parasite are only partially characterized. Deciphering these mechanisms requires a precise and comprehensive map of transcription start sites (TSSs); however, Toxoplasma TSSs have remained incompletely defined. To address this challenge, we used 5'-end RNA sequencing to genomically assess transcription initiation in both acute and chronic stages of Toxoplasma. Here, we report an in-depth analysis of transcription initiation at promoters, and provide empirically-defined TSSs for 7603 (91%) protein-coding genes, of which only 1840 concur with existing gene models. Comparing data from acute and chronic stages, we identified instances of stage-specific alternative TSSs that putatively generate mRNA isoforms with distinct 5' termini. Analysis of the nucleotide content and nucleosome occupancy around TSSs allowed us to examine the determinants of TSS choice, and outline features of Toxoplasma promoter architecture. We also found pervasive divergent transcription at Toxoplasma promoters, clustered within the nucleosomes of highly-symmetrical phased arrays, underscoring chromatin contributions to transcription initiation. Corroborating previous observations, we asserted that Toxoplasma 5' leaders are among the longest of any eukaryote studied thus far, displaying a median length of approximately 800 nucleotides. Further highlighting the utility of a precise TSS map, we pinpointed motifs associated with transcription initiation, including the binding sites of the master regulator of chronic-stage differentiation, BFD1, and a novel motif with a similar positional arrangement present at 44% of Toxoplasma promoters. This work provides a critical resource for functional genomics in Toxoplasma, and lays down a foundation to study the interactions between genomic sequences and the regulatory factors that control transcription in this parasite.
Collapse
Affiliation(s)
- Benedikt M. Markus
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Benjamin S. Waldman
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, United States
| | | | - Sebastian Lourido
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, United States
| |
Collapse
|
26
|
Breschi A, Muñoz-Aguirre M, Wucher V, Davis CA, Garrido-Martín D, Djebali S, Gillis J, Pervouchine DD, Vlasova A, Dobin A, Zaleski C, Drenkow J, Danyko C, Scavelli A, Reverter F, Snyder MP, Gingeras TR, Guigó R. A limited set of transcriptional programs define major cell types. Genome Res 2020; 30:1047-1059. [PMID: 32759341 PMCID: PMC7397875 DOI: 10.1101/gr.263186.120] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 04/29/2020] [Indexed: 12/12/2022]
Abstract
We have produced RNA sequencing data for 53 primary cells from different locations in the human body. The clustering of these primary cells reveals that most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells. These act as basic components of many tissues and organs. Based on gene expression, these cell types redefine the basic histological types by which tissues have been traditionally classified. We identified genes whose expression is specific to these cell types, and from these genes, we estimated the contribution of the major cell types to the composition of human tissues. We found this cellular composition to be a characteristic signature of tissues and to reflect tissue morphological heterogeneity and histology. We identified changes in cellular composition in different tissues associated with age and sex, and found that departures from the normal cellular composition correlate with histological phenotypes associated with disease.
Collapse
Affiliation(s)
- Alessandra Breschi
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, E-08003 Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), E-08003 Barcelona, Catalonia, Spain
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Manuel Muñoz-Aguirre
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, E-08003 Barcelona, Catalonia, Spain
- Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa, 08034 Barcelona, Catalonia, Spain
| | - Valentin Wucher
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, E-08003 Barcelona, Catalonia, Spain
| | - Carrie A Davis
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11742, USA
| | - Diego Garrido-Martín
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, E-08003 Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), E-08003 Barcelona, Catalonia, Spain
| | - Sarah Djebali
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, E-08003 Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), E-08003 Barcelona, Catalonia, Spain
- Institut National de Recherche en Santé Digestive (IRSD), Université de Toulouse, Institut National de la Santé et de la Recherche Médicale (INSERM), Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement (INRAE), École Nationale Vétérinaire de Toulouse (ENVT), Université Paul Sabatier (UPS), 31024 Toulouse, France
| | - Jesse Gillis
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Dmitri D Pervouchine
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, E-08003 Barcelona, Catalonia, Spain
- Skolkovo Institute for Science and Technology, Moscow, Russia 143025
| | - Anna Vlasova
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), 1030 Vienna, Austria
| | - Alexander Dobin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11742, USA
| | - Chris Zaleski
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11742, USA
| | - Jorg Drenkow
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11742, USA
| | - Cassidy Danyko
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11742, USA
| | | | - Ferran Reverter
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, E-08003 Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), E-08003 Barcelona, Catalonia, Spain
| | - Michael P Snyder
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Thomas R Gingeras
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11742, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, E-08003 Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), E-08003 Barcelona, Catalonia, Spain
| |
Collapse
|
27
|
Multi-strategic RNA-seq analysis reveals a high-resolution transcriptional landscape in cotton. Nat Commun 2019; 10:4714. [PMID: 31624240 PMCID: PMC6797763 DOI: 10.1038/s41467-019-12575-x] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 09/18/2019] [Indexed: 11/09/2022] Open
Abstract
Cotton is an important natural fiber crop, however, its comprehensive and high-resolution gene map is lacking. Here we integrate four complementary high-throughput techniques, including Pacbio long read Iso-seq, strand-specific RNA-seq, CAGE-seq, and PolyA-seq, to systematically explore the transcription landscape across 16 tissues or different organ types in Gossypium arboreum. We devise a computational pipeline, named IGIA, to reconstruct accurate gene structures from the integrated data. Our results reveal a dynamic and diverse transcriptional map in cotton: tissue-specific gene expression, alternative usage of TSSs and polyadenylation sites, hotspot of alternative splicing, and transcriptional read-through. These regulated events affect many genes in various aspects such as gain or loss of functional RNA motifs and protein domains, fine-tuning of DNA binding activity, and co-regulation for genes in the same complex or pathway. The methods and findings provide valuable resources for further functional genomic studies such as understanding natural SNP variations for plant community.
Collapse
|
28
|
Thodberg M, Thieffry A, Vitting-Seerup K, Andersson R, Sandelin A. CAGEfightR: analysis of 5'-end data using R/Bioconductor. BMC Bioinformatics 2019; 20:487. [PMID: 31585526 PMCID: PMC6778389 DOI: 10.1186/s12859-019-3029-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 08/15/2019] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND 5'-end sequencing assays, and Cap Analysis of Gene Expression (CAGE) in particular, have been instrumental in studying transcriptional regulation. 5'-end methods provide genome-wide maps of transcription start sites (TSSs) with base pair resolution. Because active enhancers often feature bidirectional TSSs, such data can also be used to predict enhancer candidates. The current availability of mature and comprehensive computational tools for the analysis of 5'-end data is limited, preventing efficient analysis of new and existing 5'-end data. RESULTS We present CAGEfightR, a framework for analysis of CAGE and other 5'-end data implemented as an R/Bioconductor-package. CAGEfightR can import data from BigWig files and allows for fast and memory efficient prediction and analysis of TSSs and enhancers. Downstream analyses include quantification, normalization, annotation with transcript and gene models, TSS shape statistics, linking TSSs to enhancers via co-expression, identification of enhancer clusters, and genome-browser style visualization. While built to analyze CAGE data, we demonstrate the utility of CAGEfightR in analyzing nascent RNA 5'-data (PRO-Cap). CAGEfightR is implemented using standard Bioconductor classes, making it easy to learn, use and combine with other Bioconductor packages, for example popular differential expression tools such as limma, DESeq2 and edgeR. CONCLUSIONS CAGEfightR provides a single, scalable and easy-to-use framework for comprehensive downstream analysis of 5'-end data. CAGEfightR is designed to be interoperable with other Bioconductor packages, thereby unlocking hundreds of mature transcriptomic analysis tools for 5'-end data. CAGEfightR is freely available via Bioconductor: bioconductor.org/packages/CAGEfightR .
Collapse
Affiliation(s)
- Malte Thodberg
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark.
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark.
| | - Axel Thieffry
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
| | - Kristoffer Vitting-Seerup
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
- Danish Cancer Society, Strandboulevarden 49 DK2100, Copenhagen Ø, Denmark
| | - Robin Andersson
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
| | - Albin Sandelin
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark.
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark.
| |
Collapse
|
29
|
Zhang XO, Gingeras TR, Weng Z. Genome-wide analysis of polymerase III-transcribed Alu elements suggests cell-type-specific enhancer function. Genome Res 2019; 29:1402-1414. [PMID: 31413151 PMCID: PMC6724667 DOI: 10.1101/gr.249789.119] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 07/24/2019] [Indexed: 01/09/2023]
Abstract
Alu elements are one of the most successful families of transposons in the human genome. A portion of Alu elements is transcribed by RNA Pol III, whereas the remaining ones are part of Pol II transcripts. Because Alu elements are highly repetitive, it has been difficult to identify the Pol III–transcribed elements and quantify their expression levels. In this study, we generated high-resolution, long-genomic-span RAMPAGE data in 155 biosamples all with matching RNA-seq data and built an atlas of 17,249 Pol III–transcribed Alu elements. We further performed an integrative analysis on the ChIP-seq data of 10 histone marks and hundreds of transcription factors, whole-genome bisulfite sequencing data, ChIA-PET data, and functional data in several biosamples, and our results revealed that although the human-specific Alu elements are transcriptionally repressed, the older, expressed Alu elements may be exapted by the human host to function as cell-type–specific enhancers for their nearby protein-coding genes.
Collapse
Affiliation(s)
- Xiao-Ou Zhang
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Thomas R Gingeras
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA.,Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| |
Collapse
|
30
|
Bhardwaj V, Semplicio G, Erdogdu NU, Manke T, Akhtar A. MAPCap allows high-resolution detection and differential expression analysis of transcription start sites. Nat Commun 2019; 10:3219. [PMID: 31363093 PMCID: PMC6667505 DOI: 10.1038/s41467-019-11115-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 06/20/2019] [Indexed: 01/06/2023] Open
Abstract
The position, shape and number of transcription start sites (TSS) are critical determinants of gene regulation. Most methods developed to detect TSSs and study promoter usage are, however, of limited use in studies that demand quantification of expression changes between two or more groups. In this study, we combine high-resolution detection of transcription start sites and differential expression analysis using a simplified TSS quantification protocol, MAPCap (Multiplexed Affinity Purification of Capped RNA) along with the software icetea. Applying MAPCap on developing Drosophila melanogaster embryos and larvae, we detected stage and sex-specific promoter and enhancer activity and quantify the effect of mutants of maleless (MLE) helicase at X-chromosomal promoters. We observe that MLE mutation leads to a median 1.9 fold drop in expression of X-chromosome promoters and affects the expression of several TSSs with a sexually dimorphic expression on autosomes. Our results provide quantitative insights into promoter activity during dosage compensation. The position, shape and number of transcription start sites (TSS) regulate gene expression. Here authors present MAPCap, a method for high-resolution detection and differential expression analysis of TSS, and apply MAPCap to early fly development, detecting stage and sex-specific promoter and enhancer activity.
Collapse
Affiliation(s)
- Vivek Bhardwaj
- Max Planck Institute for Immunobiology and Epigenetics, 79108, Freiburg, Germany.,Faculty of Biology, University of Freiburg, 79104, Freiburg, Germany
| | - Giuseppe Semplicio
- Max Planck Institute for Immunobiology and Epigenetics, 79108, Freiburg, Germany
| | - Niyazi Umut Erdogdu
- Max Planck Institute for Immunobiology and Epigenetics, 79108, Freiburg, Germany.,Faculty of Biology, University of Freiburg, 79104, Freiburg, Germany
| | - Thomas Manke
- Max Planck Institute for Immunobiology and Epigenetics, 79108, Freiburg, Germany
| | - Asifa Akhtar
- Max Planck Institute for Immunobiology and Epigenetics, 79108, Freiburg, Germany.
| |
Collapse
|
31
|
In silico prediction of prolactin molecules as a tool for equine genomics reproduction. Mol Divers 2019; 23:1019-1028. [PMID: 30740642 DOI: 10.1007/s11030-018-09914-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Accepted: 12/31/2018] [Indexed: 10/27/2022]
Abstract
The prolactin hormone is involved in several biological functions, although its main role resides on reproduction. As it interferes on fertility changes, studies focused on human health have established a linkage of this hormone to fertility losses. Regarding animal research, there is still a lack of information about the structure of prolactin. In case of horse breeding, prolactin has a particular influence; once there is an individualization of these animals and equines are known for presenting several reproductive disorders. As there is no molecular structure available for the prolactin hormone and receptor, we performed several bioinformatics analyses through prediction and refinement softwares, as well as manual modifications. Aiming to elucidate the first computational structure of both molecules and analyse structural and functional aspects related to these proteins, here we provide the first known equine model for prolactin and prolactin receptor, which obtained high global quality scores in diverse software's for quality assessment. QMEAN overall score obtained for ePrl was (- 4.09) and QMEANbrane for ePrlr was (- 8.45), which proves the structures' reliability. This study will implement another tool in equine genomics in order to give light to interactions of these molecules, structural and functional alterations and therefore help diagnosing fertility problems, contributing in the selection of a high genetic herd.
Collapse
|
32
|
Giuffra E, Tuggle CK. Functional Annotation of Animal Genomes (FAANG): Current Achievements and Roadmap. Annu Rev Anim Biosci 2018; 7:65-88. [PMID: 30427726 DOI: 10.1146/annurev-animal-020518-114913] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Functional annotation of genomes is a prerequisite for contemporary basic and applied genomic research, yet farmed animal genomics is deficient in such annotation. To address this, the FAANG (Functional Annotation of Animal Genomes) Consortium is producing genome-wide data sets on RNA expression, DNA methylation, and chromatin modification, as well as chromatin accessibility and interactions. In addition to informing our understanding of genome function, including comparative approaches to elucidate constrained sequence or epigenetic elements, these annotation maps will improve the precision and sensitivity of genomic selection for animal improvement. A scientific community-driven effort has already created a coordinated data collection and analysis enterprise crucial for the success of this global effort. Although it is early in this continuing process, functional data have already been produced and application to genetic improvement reported. The functional annotation delivered by the FAANG initiative will add value and utility to the greatly improved genome sequences being established for domesticated animal species.
Collapse
Affiliation(s)
- Elisabetta Giuffra
- Génétique Animale et Biologie Intégrative (GABI), Institut National de la Recherche Agronomique (INRA), AgroParisTech, Université Paris Saclay, 78350 Jouy-en-Josas, France;
| | | | | |
Collapse
|
33
|
Kern C, Wang Y, Chitwood J, Korf I, Delany M, Cheng H, Medrano JF, Van Eenennaam AL, Ernst C, Ross P, Zhou H. Genome-wide identification of tissue-specific long non-coding RNA in three farm animal species. BMC Genomics 2018; 19:684. [PMID: 30227846 PMCID: PMC6145346 DOI: 10.1186/s12864-018-5037-7] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 08/27/2018] [Indexed: 03/08/2023] Open
Abstract
Background Numerous long non-coding RNAs (lncRNAs) have been identified and their roles in gene regulation in humans, mice, and other model organisms studied; however, far less research has been focused on lncRNAs in farm animal species. While previous studies in chickens, cattle, and pigs identified lncRNAs in specific developmental stages or differentially expressed under specific conditions in a limited number of tissues, more comprehensive identification of lncRNAs in these species is needed. The goal of the FAANG Consortium (Functional Annotation of Animal Genomes) is to functionally annotate animal genomes, including the annotation of lncRNAs. As one of the FAANG pilot projects, lncRNAs were identified across eight tissues in two adult male biological replicates from chickens, cattle, and pigs. Results Comprehensive lncRNA annotations for the chicken, cattle, and pig genomes were generated by utilizing RNA-seq from eight tissue types from two biological replicates per species at the adult developmental stage. A total of 9393 lncRNAs in chickens, 7235 lncRNAs in cattle, and 14,429 lncRNAs in pigs were identified. Including novel isoforms and lncRNAs from novel loci, 5288 novel lncRNAs were identified in chickens, 3732 in cattle, and 4870 in pigs. These transcripts match previously known patterns of lncRNAs, such as generally lower expression levels than mRNAs and higher tissue specificity. An analysis of lncRNA conservation across species identified a set of conserved lncRNAs with potential functions associated with chromatin structure and gene regulation. Tissue-specific lncRNAs were identified. Genes proximal to tissue-specific lncRNAs were enriched for GO terms associated with the tissue of origin, such as leukocyte activation in spleen. Conclusions LncRNAs were identified in three important farm animal species using eight tissues from adult individuals. About half of the identified lncRNAs were not previously reported in the NCBI annotations for these species. While lncRNAs are less conserved than protein-coding genes, a set of positionally conserved lncRNAs were identified among chickens, cattle, and pigs with potential functions related to chromatin structure and gene regulation. Tissue-specific lncRNAs have potential regulatory functions on genes enriched for tissue-specific GO terms. Future work will include epigenetic data from ChIP-seq experiments to further refine these annotations. Electronic supplementary material The online version of this article (10.1186/s12864-018-5037-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Colin Kern
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | - Ying Wang
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | - James Chitwood
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | - Ian Korf
- Genome Center, University of California, Davis, Davis, CA, USA
| | - Mary Delany
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | - Hans Cheng
- USDA-ARS, Avian Disease and Oncology Laboratory, East Lansing, MI, USA
| | - Juan F Medrano
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | | | - Catherine Ernst
- Department of Animal Science, Michigan State University, East Lansing, MI, USA
| | - Pablo Ross
- Department of Animal Science, University of California, Davis, Davis, CA, USA.
| | - Huaijun Zhou
- Department of Animal Science, University of California, Davis, Davis, CA, USA.
| |
Collapse
|
34
|
Abstract
Single-cell RNAseq and alternative splicing studies have recently become two of the most prominent applications of RNAseq. However, the combination of both is still challenging, and few research efforts have been dedicated to the intersection between them. Cell-level insight on isoform expression is required to fully understand the biology of alternative splicing, but it is still an open question to what extent isoform expression analysis at the single-cell level is actually feasible. Here, we establish a set of four conditions that are required for a successful single-cell-level isoform study and evaluate how these conditions are met by these technologies in published research.
Collapse
Affiliation(s)
- Ángeles Arzalluz-Luque
- Genomics of Gene Expression Laboratory, Centro de Investigación Principe Felipe (CIPF), 46012, Valencia, Spain
| | - Ana Conesa
- Genomics of Gene Expression Laboratory, Centro de Investigación Principe Felipe (CIPF), 46012, Valencia, Spain.
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, Genetics Institute, University of Florida, Gainesville, Florida, 32611, USA.
| |
Collapse
|
35
|
Genome-Wide TSS Identification in Maize. Methods Mol Biol 2018. [PMID: 30043374 DOI: 10.1007/978-1-4939-8657-6_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Regulation of gene expression is a fundamental biological process that relies on transcription factors (TF) recognizing specific cis motifs in the regulatory regions of the genes that they control. In most eukaryotic organisms, cis-regulatory elements are significantly enriched around the transcription start site (TSS). However, different from other genic features, TSSs need to be experimentally determined, becoming then important components of genome annotations. One of the methods for experimentally determining TSSs at the genome-wide level is CAGE (cap analysis of gene expression). This chapter describes how to prepare a CAGE library for sequencing, starting with RNA extraction, library construction, and quality controls before proceed to sequencing in the Illumina platform. We then describe how to use a computational pipeline to determine, from the alignment of CAGE tags, the genome-wide location of TSSs, followed with statistical approaches required to cluster TSSs that operate as transcriptional units, and to determine core promoter properties such as shape. The analyses described here focus on maize, since its large and yet deficiently annotated genome creates some unique challenges, but with some modifications can be easily adopted for other organisms as well.
Collapse
|
36
|
Comprehensive comparative analysis of 5'-end RNA-sequencing methods. Nat Methods 2018; 15:505-511. [PMID: 29867192 PMCID: PMC6075671 DOI: 10.1038/s41592-018-0014-2] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 04/10/2018] [Indexed: 12/20/2022]
Abstract
Specialized RNA-seq methods are required to identify the 5' ends of transcripts, which are critical for studies of gene regulation, but these methods have not been systematically benchmarked. We directly compared six such methods, including the performance of five methods on a single human cellular RNA sample and a new spike-in RNA assay that helps circumvent challenges resulting from uncertainties in annotation and RNA processing. We found that the 'cap analysis of gene expression' (CAGE) method performed best for mRNA and that most of its unannotated peaks were supported by evidence from other genomic methods. We applied CAGE to eight brain-related samples and determined sample-specific transcription start site (TSS) usage, as well as a transcriptome-wide shift in TSS usage between fetal and adult brain.
Collapse
|
37
|
Lim CS, T. Wardell SJ, Kleffmann T, Brown CM. The exon-intron gene structure upstream of the initiation codon predicts translation efficiency. Nucleic Acids Res 2018; 46:4575-4591. [PMID: 29684192 PMCID: PMC5961209 DOI: 10.1093/nar/gky282] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 03/28/2018] [Accepted: 04/06/2018] [Indexed: 12/16/2022] Open
Abstract
Introns in mRNA leaders are common in complex eukaryotes, but often overlooked. These introns are spliced out before translation, leaving exon-exon junctions in the mRNA leaders (leader EEJs). Our multi-omic approach shows that the number of leader EEJs inversely correlates with the main protein translation, as does the number of upstream open reading frames (uORFs). Across the five species studied, the lowest levels of translation were observed for mRNAs with both leader EEJs and uORFs (29%). This class of mRNAs also have ribosome footprints on uORFs, with strong triplet periodicity indicating uORF translation. Furthermore, the positions of both leader EEJ and uORF are conserved between human and mouse. Thus, the uORF, in combination with leader EEJ predicts lower expression for nearly one-third of eukaryotic proteins.
Collapse
Affiliation(s)
- Chun Shen Lim
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Samuel J T. Wardell
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Torsten Kleffmann
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Chris M Brown
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| |
Collapse
|
38
|
Loftus SK. The next generation of melanocyte data: Genetic, epigenetic, and transcriptional resource datasets and analysis tools. Pigment Cell Melanoma Res 2018; 31:442-447. [DOI: 10.1111/pcmr.12687] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Accepted: 01/09/2018] [Indexed: 11/28/2022]
Affiliation(s)
- Stacie K. Loftus
- Genetic Disease Research Branch; National Human Genome Research Institute; National Institutes of Health; Bethesda MD USA
| |
Collapse
|
39
|
Batut PJ, Gingeras TR. Conserved noncoding transcription and core promoter regulatory code in early Drosophila development. eLife 2017; 6:29005. [PMID: 29260710 PMCID: PMC5754203 DOI: 10.7554/elife.29005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 12/19/2017] [Indexed: 01/30/2023] Open
Abstract
Multicellular development is driven by regulatory programs that orchestrate the transcription of protein-coding and noncoding genes. To decipher this genomic regulatory code, and to investigate the developmental relevance of noncoding transcription, we compared genome-wide promoter activity throughout embryogenesis in 5 Drosophila species. Core promoters, generally not thought to play a significant regulatory role, in fact impart restrictions on the developmental timing of gene expression on a global scale. We propose a hierarchical regulatory model in which core promoters define broad windows of opportunity for expression, by defining a range of transcription factors from which they can receive regulatory inputs. This two-tiered mechanism globally orchestrates developmental gene expression, including extremely widespread noncoding transcription. The sequence and expression specificity of noncoding RNA promoters are evolutionarily conserved, implying biological relevance. Overall, this work introduces a hierarchical model for developmental gene regulation, and reveals a major role for noncoding transcription in animal development.
Collapse
Affiliation(s)
- Philippe J Batut
- Watson School of Biological Sciences, Cold Spring Harbor Laboratory, New York, United States
| | - Thomas R Gingeras
- Watson School of Biological Sciences, Cold Spring Harbor Laboratory, New York, United States
| |
Collapse
|
40
|
Triska M, Solovyev V, Baranova A, Kel A, Tatarinova TV. Nucleotide patterns aiding in prediction of eukaryotic promoters. PLoS One 2017; 12:e0187243. [PMID: 29141011 PMCID: PMC5687710 DOI: 10.1371/journal.pone.0187243] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Accepted: 09/05/2017] [Indexed: 01/09/2023] Open
Abstract
Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into “promoters” and “non-promoters” even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 “promoter-specific” transcription factors), those that bind preferentially to the [0,500] region (282 “5′ UTR-specific” TFs), and 207 of the “promiscuous” transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots.
Collapse
Affiliation(s)
- Martin Triska
- Children’s Hospital Los Angeles, University of Southern California, Los Angeles, CA, United States of America
- Faculty of Advanced Technology, University of South Wales, Pontypridd, Wales, United Kingdom
| | | | - Ancha Baranova
- School of Systems Biology, George Mason University, Fairfax, VA, United States of America
- Research Centre for Medical Genetics, Moscow, Russia
| | - Alexander Kel
- geneXplain GmbH, Wolfenbuettel, Germany
- Institute of Chemical Biology and Fundamental Medicine, Novosibirsk, Russia
| | - Tatiana V. Tatarinova
- School of Systems Biology, George Mason University, Fairfax, VA, United States of America
- Department of Biology, Division of Natural Sciences, University of La Verne, La Verne, CA, United States of America
- Bioinformatics Center, AA Kharkevich Institute for Information Transmission Problems RAS, Moscow, Russia
- Vavilov’s Institute for General Genetics, Moscow, Russia, Moscow, Russia
- * E-mail:
| |
Collapse
|
41
|
Chakravorty S, Hegde M. Gene and Variant Annotation for Mendelian Disorders in the Era of Advanced Sequencing Technologies. Annu Rev Genomics Hum Genet 2017; 18:229-256. [PMID: 28415856 DOI: 10.1146/annurev-genom-083115-022545] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Comprehensive annotations of genetic and noncoding regions and corresponding accurate variant classification for Mendelian diseases are the next big challenge in the new genomic era of personalized medicine. Progress in the development of faster and more accurate pipelines for genome annotation and variant classification will lead to the discovery of more novel disease associations and candidate therapeutic targets. This ultimately will facilitate better patient recruitment in clinical trials. In this review, we describe the trends in research at the intersection of basic and clinical genomics that aims to increase understanding of overall genomic complexity, complex inheritance patterns of disease, and patient-phenotype-specific genomic associations. We describe the emerging field of translational functional genomics, which integrates other functional "-omics" approaches that support next-generation sequencing genomic data in order to facilitate personalized diagnostics, disease management, biomarker discovery, and medicine. We also discuss the utility of this integrated approach for diagnostic clinics and medical databases and its role in the future of personalized medicine.
Collapse
Affiliation(s)
- Samya Chakravorty
- Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia 30322;
| | - Madhuri Hegde
- Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia 30322;
| |
Collapse
|
42
|
Raborn RT, Spitze K, Brendel VP, Lynch M. Promoter Architecture and Sex-Specific Gene Expression in Daphnia pulex. Genetics 2016; 204:593-612. [PMID: 27585846 PMCID: PMC5068849 DOI: 10.1534/genetics.116.193334] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2016] [Accepted: 07/29/2016] [Indexed: 11/18/2022] Open
Abstract
Large-scale transcription start site (TSS) profiling produces a high-resolution, quantitative picture of transcription initiation and core promoter locations within a genome. However, application of TSS profiling to date has largely been restricted to a small set of prominent model systems. We sought to characterize the cis-regulatory landscape of the water flea Daphnia pulex, an emerging model arthropod that reproduces both asexually (via parthenogenesis) and sexually (via meiosis). We performed Cap Analysis of Gene Expression (CAGE) with RNA isolated from D. pulex within three developmental states: sexual females, asexual females, and males. Identified TSSs were utilized to generate a "Daphnia Promoter Atlas," i.e., a catalog of active promoters across the surveyed states. Analysis of the distribution of promoters revealed evidence for widespread alternative promoter usage in D. pulex, in addition to a prominent fraction of compactly-arranged promoters in divergent orientations. We carried out de novo motif discovery using CAGE-defined TSSs and identified eight candidate core promoter motifs; this collection includes canonical promoter elements (e.g., TATA and Initiator) in addition to others lacking obvious orthologs. A comparison of promoter activities found evidence for considerable state-specific differential gene expression between states. Our work represents the first global definition of transcription initiation and promoter architecture in crustaceans. The Daphnia Promoter Atlas presented here provides a valuable resource for comparative study of cis-regulatory regions in metazoans, as well as for investigations into the circuitries that underpin meiosis and parthenogenesis.
Collapse
Affiliation(s)
- R Taylor Raborn
- Department of Biology, Indiana University, Bloomington, Indiana 47405 School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405
| | - Ken Spitze
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| | - Volker P Brendel
- Department of Biology, Indiana University, Bloomington, Indiana 47405 School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405
| | - Michael Lynch
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| |
Collapse
|
43
|
|
44
|
Caillet-Boudin ML, Buée L, Sergeant N, Lefebvre B. Regulation of human MAPT gene expression. Mol Neurodegener 2015; 10:28. [PMID: 26170022 PMCID: PMC4499907 DOI: 10.1186/s13024-015-0025-8] [Citation(s) in RCA: 118] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 06/30/2015] [Indexed: 12/12/2022] Open
Abstract
The number of known pathologies involving deregulated Tau expression/metabolism is increasing. Indeed, in addition to tauopathies, which comprise approximately 30 diseases characterized by neuronal aggregation of hyperphosphorylated Tau in brain neurons, this protein has also been associated with various other pathologies such as cancer, inclusion body myositis, and microdeletion/microduplication syndromes, suggesting its possible function in peripheral tissues. In addition to Tau aggregation, Tau deregulation can occur at the expression and/or splicing levels, as has been clearly demonstrated in some of these pathologies. Here, we aim to review current knowledge regarding the regulation of human MAPT gene expression at the DNA and RNA levels to provide a better understanding of its possible deregulation. Several aspects, including repeated motifs, CpG island/methylation, and haplotypes at the DNA level, as well as the key regions involved in mRNA expression and stability and the splicing patterns of different mRNA isoforms at the RNA level, will be discussed.
Collapse
Affiliation(s)
| | - Luc Buée
- Univ. Lille, UMR-S 1172, Inserm, CHU, 59000, Lille, France
| | | | - Bruno Lefebvre
- Univ. Lille, UMR-S 1172, Inserm, CHU, 59000, Lille, France
| |
Collapse
|