1
|
Lindner J, Dassa B, Wigoda N, Stelzer G, Feldmesser E, Prilusky J, Leshkowitz D. UTAP2: an enhanced user-friendly transcriptome and epigenome analysis pipeline. BMC Bioinformatics 2025; 26:79. [PMID: 40055635 PMCID: PMC11889741 DOI: 10.1186/s12859-025-06090-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2024] [Accepted: 02/19/2025] [Indexed: 05/13/2025] Open
Abstract
BACKGROUND The emergence of next-generation sequencing (NGS) marked a revolution in biological research, enabling comprehensive characterization of the transcriptome and detailed analysis of the epigenome landscape. This technology has made it possible to detect differences across cell types, genotypes, and conditions. Advances in short-read sequencing platforms, have produced user-friendly machines that offer high throughput at a reduced cost per base. However, leveraging this data still requires bioinformatics expertise to develop and execute tailored solutions for each specific application. Democratizing access to sequence analysis tools is crucial to empower researchers from diverse fields to harness the full potential of NGS data. RESULTS UTAP2, our enhanced version of UTAP published version in 2019 (Kohen et al. in BMC Bioinform 20(1):154, 2019), empowers researchers to unlock the mysteries of gene expression and epigenetic modifications with ease. This user-friendly, open-source pipeline, built by unit programmers and deep sequencing analysts, streamlines transcriptome and epigenome data analysis, handling everything from sequences to gene or peak counts and differentially expressed genes or genomic regions annotation. Results are delivered in organized folders and rich reports packed with plots, tables, and links for effortless interpretation. Since the debut of UTAP, it has been embraced by many researchers at the Weizmann Institute and over 100 citations, thus highlighting its scientific contribution. CONCLUSION Our User-friendly Transcriptome and Epigenome Analysis Pipeline UTAP2 is available to the broader biomedical research community as an open-source installation. With a single image, it can be installed on both local servers and cloud platforms, allowing users to leverage parallel cluster resources. Once installed UTAP2 enables researchers, even those with limited bioinformatics skills to efficiently, accurately and reliably analyse transcriptome and epigenome sequence data.
Collapse
Affiliation(s)
- Jordana Lindner
- Bioinformatics Unit, Department of Life Sciences Core Facilities, Weizmann Institute of Science, 76100, Rehovot, Israel
| | - Bareket Dassa
- Bioinformatics Unit, Department of Life Sciences Core Facilities, Weizmann Institute of Science, 76100, Rehovot, Israel
| | - Noa Wigoda
- Bioinformatics Unit, Department of Life Sciences Core Facilities, Weizmann Institute of Science, 76100, Rehovot, Israel
| | - Gil Stelzer
- Bioinformatics Unit, Department of Life Sciences Core Facilities, Weizmann Institute of Science, 76100, Rehovot, Israel
| | - Ester Feldmesser
- Bioinformatics Unit, Department of Life Sciences Core Facilities, Weizmann Institute of Science, 76100, Rehovot, Israel
| | - Jaime Prilusky
- Bioinformatics Unit, Department of Life Sciences Core Facilities, Weizmann Institute of Science, 76100, Rehovot, Israel
| | - Dena Leshkowitz
- Bioinformatics Unit, Department of Life Sciences Core Facilities, Weizmann Institute of Science, 76100, Rehovot, Israel.
| |
Collapse
|
2
|
Spealman P, de Santana C, De T, Gresham D. Multilevel Gene Expression Changes in Lineages Containing Adaptive Copy Number Variants. Mol Biol Evol 2025; 42:msaf005. [PMID: 39847535 PMCID: PMC11789944 DOI: 10.1093/molbev/msaf005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 10/28/2024] [Accepted: 12/02/2024] [Indexed: 01/25/2025] Open
Abstract
Copy number variants (CNVs) are an important class of genetic variation that can mediate rapid adaptive evolution. Whereas, CNVs can increase the relative fitness of the organism, they can also incur a cost due to the associated increased gene expression and repetitive DNA. We previously evolved populations of Saccharomyces cerevisiae over hundreds of generations in glutamine-limited (Gln-) chemostats and observed the recurrent evolution of CNVs at the GAP1 locus. To understand the role that gene expression plays in adaptation, both in relation to the adaptation of the organism to the selective condition and as a consequence of the CNV, we measured the transcriptome, translatome, and proteome of 4 strains of evolved yeast, each with a unique CNV, and their ancestor in Gln- chemostats. We find CNV-amplified genes correlate with higher mRNA abundance; however, this effect is reduced at the level of the proteome, consistent with post-transcriptional dosage compensation. By normalizing each level of gene expression by the abundance of the preceding step we were able to identify widespread differences in the efficiency of each level of gene expression. Genes with significantly different translational efficiency were enriched for potential regulatory mechanisms including either upstream open reading frames, RNA-binding sites for Ssd1, or both. Genes with lower protein expression efficiency were enriched for genes encoding proteins in protein complexes. Taken together, our study reveals widespread changes in gene expression at multiple regulatory levels in lineages containing adaptive CNVs highlighting the diverse ways in which genome evolution shapes gene expression.
Collapse
Affiliation(s)
- Pieter Spealman
- Center for Genomics and Systems Biology, Department of Biology—New York University, New York, NY, USA
| | - Carolina de Santana
- Laboratório de Microbiologia Ambiental e Saúde Pública—Universidade Estadual de Feira de Santana (UEFS), Bahia, Brazil
| | - Titir De
- Center for Genomics and Systems Biology, Department of Biology—New York University, New York, NY, USA
| | - David Gresham
- Center for Genomics and Systems Biology, Department of Biology—New York University, New York, NY, USA
| |
Collapse
|
3
|
Spealman P, de Santana C, De T, Gresham D. Multilevel gene expression changes in lineages containing adaptive copy number variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.20.563336. [PMID: 37961325 PMCID: PMC10634702 DOI: 10.1101/2023.10.20.563336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Copy-number variants (CNVs) are an important class of recurrent variants that mediate adaptive evolution. While CNVs can increase the relative fitness of the organism, they can also incur a cost. We previously evolved populations of Saccharomyces cerevisiae over hundreds of generations in glutamine-limited (Gln-) chemostats and observed the recurrent evolution of CNVs at the GAP1 locus. To understand the role that expression plays in adaptation, both in relation to the adaptation of the organism to the selective condition, and as a consequence of the CNV, we measured the transcriptome, translatome, and proteome of 4 strains of evolved yeast, each with a unique CNV, and their ancestor in Gln- conditions. We find CNV-amplified genes correlate with higher RNA abundance; however, this effect is reduced at the level of the proteome, consistent with post-transcriptional dosage compensation. By normalizing each level of expression by the abundance of the preceding step we were able to identify widespread divergence in the efficiency of each step in the gene in the efficiency of each step in gene expression. Genes with significantly different translational efficiency were enriched for potential regulatory mechanisms including either upstream open reading frames, RNA binding sites for SSD1, or both. Genes with lower protein expression efficiency were enriched for genes encoding proteins in protein complexes. Taken together, our study reveals widespread changes in gene expression at multiple regulatory levels in lineages containing adaptive CNVs highlighting the diverse ways in which adaptive evolution shapes gene expression.
Collapse
Affiliation(s)
- Pieter Spealman
- Center for Genomics and Systems Biology, Department of Biology, New York University
| | - Carolina de Santana
- Laboratório de Microbiologia Ambiental e Saúde Pública - Universidade Estadual de Feira de Santana (UEFS), Bahia
| | - Titir De
- Center for Genomics and Systems Biology, Department of Biology, New York University
| | - David Gresham
- Center for Genomics and Systems Biology, Department of Biology, New York University
| |
Collapse
|
4
|
Salussolia CL, Winden KD, Sahin M. Translating Ribosome Affinity Purification (TRAP) of Cell Type-specific mRNA from Mouse Brain Lysates. Bio Protoc 2022; 12:e4407. [PMID: 35800463 PMCID: PMC9090583 DOI: 10.21769/bioprotoc.4407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 03/28/2022] [Indexed: 01/11/2023] Open
Abstract
Mammalian tissues are highly heterogenous and complex, posing a challenge in understanding the molecular mechanisms regulating protein expression within various tissues. Recent studies have shown that translation at the level of the ribosome is highly regulated, and can vary independently of gene expression observed at a transcriptome level, as well as between cell populations, contributing to the diversity of mammalian tissues. Earlier methods that analyzed gene expression at the level of translation, such as polysomal- or ribosomal-profiling, required large amounts of starting material to isolate enough RNA for analysis by microarray or RNA-sequencing. Thus, rare or less abundant cell types within tissues were not able to be properly studied with these methods. Translating ribosome affinity purification (TRAP) utilizes the incorporation of an eGFP-affinity tag on the large ribosome subunit, driven by expression of cell-type specific Cre-lox promoters, to allow for identification and capture of transcripts from actively translating ribosomes in a cell-specific manner. As a result, TRAP offers a unique opportunity to evaluate the entire mRNA translation profile within a specific cell type, and increase our understanding regarding the cellular complexity of mammalian tissues. Graphical abstract: Schematic demonstrating TRAP protocol for identifying ribosome-bound transcripts specifically within cerebellar Purkinje cells.
Collapse
Affiliation(s)
- Catherine L. Salussolia
- F.M. Kirby Neurobiology Center, Rosamund Stone Zander Translational Neuroscience Center, Department of Neurology, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Kellen D. Winden
- F.M. Kirby Neurobiology Center, Rosamund Stone Zander Translational Neuroscience Center, Department of Neurology, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Mustafa Sahin
- F.M. Kirby Neurobiology Center, Rosamund Stone Zander Translational Neuroscience Center, Department of Neurology, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
5
|
Chen H, Alonso JM, Stepanova AN. A Ribo-Seq Method to Study Genome-Wide Translational Regulation in Plants. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2494:61-98. [PMID: 35467201 DOI: 10.1007/978-1-0716-2297-1_6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Protein production from mRNA is one of the fundamental molecular processes in a cell. Accurate genome-wide information on the levels of translation and ribosome distribution on mRNA can be gathered by carrying out ribosome footprinting, aka Ribo-seq. Herein, we present a detailed protocol describing the construction of parallel Ribo-seq and RNA-seq libraries from Arabidopsis seedlings treated with the plant hormone auxin. The improved protocol for ribosome footprint library generation can be easily adapted to analyzing the effects on translation of genetic perturbations and various abiotic and biotic factors to shed the much-needed light on translational regulation in plants.
Collapse
Affiliation(s)
- Hao Chen
- Department of Plant and Microbial Biology, Program in Genetics, North Carolina State University, Raleigh, NC, USA
| | - Jose M Alonso
- Department of Plant and Microbial Biology, Program in Genetics, North Carolina State University, Raleigh, NC, USA
| | - Anna N Stepanova
- Department of Plant and Microbial Biology, Program in Genetics, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
6
|
Mundodi V, Choudhary S, Smith AD, Kadosh D. Global translational landscape of the Candida albicans morphological transition. G3-GENES GENOMES GENETICS 2021; 11:6046988. [PMID: 33585865 PMCID: PMC7849906 DOI: 10.1093/g3journal/jkaa043] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 12/01/2020] [Indexed: 12/14/2022]
Abstract
Candida albicans, a major human fungal pathogen associated with high mortality and/or morbidity rates in a wide variety of immunocompromised individuals, undergoes a reversible morphological transition from yeast to filamentous cells that is required for virulence. While previous studies have identified and characterized global transcriptional mechanisms important for driving this transition, as well as other virulence properties, in C. albicans and other pathogens, considerably little is known about the role of genome-wide translational mechanisms. Using ribosome profiling, we report the first global translational profile associated with C. albicans morphogenesis. Strikingly, many genes involved in pathogenesis, filamentation, and the response to stress show reduced translational efficiency (TE). Several of these genes are known to be strongly induced at the transcriptional level, suggesting that a translational fine-tuning mechanism is in place. We also identify potential upstream open reading frames (uORFs), associated with genes involved in pathogenesis, and novel ORFs, several of which show altered TE during filamentation. Using a novel bioinformatics method for global analysis of ribosome pausing that will be applicable to a wide variety of genetic systems, we demonstrate an enrichment of ribosome pausing sites in C. albicans genes associated with protein synthesis and cell wall functions. Altogether, our results suggest that the C. albicans morphological transition, and most likely additional virulence processes in fungal pathogens, is associated with widespread global alterations in TE that do not simply reflect changes in transcript levels. These alterations affect the expression of many genes associated with processes essential for virulence and pathogenesis.
Collapse
Affiliation(s)
- Vasanthakrishna Mundodi
- Department of Microbiology, Immunology and Molecular Genetics, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Saket Choudhary
- Department of Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA 90089, USA
| | - Andrew D Smith
- Department of Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA 90089, USA
| | - David Kadosh
- Department of Microbiology, Immunology and Molecular Genetics, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| |
Collapse
|
7
|
uORF-seqr: A Machine Learning-Based Approach to the Identification of Upstream Open Reading Frames in Yeast. Methods Mol Biol 2021. [PMID: 33765283 DOI: 10.1007/978-1-0716-1150-0_15] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The identification of upstream open reading frames (uORFs) using ribosome profiling data is complicated by several factors such as the noise inherent to the procedure, the substantial increase in potential translation initiation sites (and false positives) when one includes non-canonical start codons, and the paucity of molecularly validated uORFs. Here we present uORF-seqr, a novel machine learning algorithm that uses ribosome profiling data, in conjunction with RNA-seq data, as well as transcript aware genome annotation files to identify statistically significant AUG and near-cognate codon uORFs.
Collapse
|
8
|
XPRESSyourself: Enhancing, standardizing, and automating ribosome profiling computational analyses yields improved insight into data. PLoS Comput Biol 2020; 16:e1007625. [PMID: 32004313 PMCID: PMC7015430 DOI: 10.1371/journal.pcbi.1007625] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 02/12/2020] [Accepted: 12/20/2019] [Indexed: 11/19/2022] Open
Abstract
Ribosome profiling, an application of nucleic acid sequencing for monitoring ribosome activity, has revolutionized our understanding of protein translation dynamics. This technique has been available for a decade, yet the current state and standardization of publicly available computational tools for these data is bleak. We introduce XPRESSyourself, an analytical toolkit that eliminates barriers and bottlenecks associated with this specialized data type by filling gaps in the computational toolset for both experts and non-experts of ribosome profiling. XPRESSyourself automates and standardizes analysis procedures, decreasing time-to-discovery and increasing reproducibility. This toolkit acts as a reference implementation of current best practices in ribosome profiling analysis. We demonstrate this toolkit’s performance on publicly available ribosome profiling data by rapidly identifying hypothetical mechanisms related to neurodegenerative phenotypes and neuroprotective mechanisms of the small-molecule ISRIB during acute cellular stress. XPRESSyourself brings robust, rapid analysis of ribosome-profiling data to a broad and ever-expanding audience and will lead to more reproducible and accessible measurements of translation regulation. XPRESSyourself software is perpetually open-source under the GPL-3.0 license and is hosted at https://github.com/XPRESSyourself, where users can access additional documentation and report software issues.
Collapse
|
9
|
Spealman P, Naik AW, May GE, Kuersten S, Freeberg L, Murphy RF, McManus J. Conserved non-AUG uORFs revealed by a novel regression analysis of ribosome profiling data. Genome Res 2017; 28:214-222. [PMID: 29254944 PMCID: PMC5793785 DOI: 10.1101/gr.221507.117] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 12/11/2017] [Indexed: 12/14/2022]
Abstract
Upstream open reading frames (uORFs), located in transcript leaders (5' UTRs), are potent cis-acting regulators of translation and mRNA turnover. Recent genome-wide ribosome profiling studies suggest that thousands of uORFs initiate with non-AUG start codons. Although intriguing, these non-AUG uORF predictions have been made without statistical control or validation; thus, the importance of these elements remains to be demonstrated. To address this, we took a comparative genomics approach to study AUG and non-AUG uORFs. We mapped transcription leaders in multiple Saccharomyces yeast species and applied a novel machine learning algorithm (uORF-seqr) to ribosome profiling data to identify statistically significant uORFs. We found that AUG and non-AUG uORFs are both frequently found in Saccharomyces yeasts. Although most non-AUG uORFs are found in only one species, hundreds have either conserved sequence or position within Saccharomyces uORFs initiating with UUG are particularly common and are shared between species at rates similar to that of AUG uORFs. However, non-AUG uORFs are translated less efficiently than AUG-uORFs and are less subject to removal via alternative transcription initiation under normal growth conditions. These results suggest that a subset of non-AUG uORFs may play important roles in regulating gene expression.
Collapse
Affiliation(s)
- Pieter Spealman
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Armaghan W Naik
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Gemma E May
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | | | | | - Robert F Murphy
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.,Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Joel McManus
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
10
|
Translation complex profile sequencing to study the in vivo dynamics of mRNA–ribosome interactions during translation initiation, elongation and termination. Nat Protoc 2017; 12:697-731. [DOI: 10.1038/nprot.2016.189] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|