1
|
Elsworth B, Ye S, Dass S, Tennessen JA, Sultana Q, Thommen BT, Paul AS, Kanjee U, Grüring C, Ferreira MU, Gubbels MJ, Zarringhalam K, Duraisingh MT. The essential genome of Plasmodium knowlesi reveals determinants of antimalarial susceptibility. Science 2025; 387:eadq6241. [PMID: 39913579 PMCID: PMC12104972 DOI: 10.1126/science.adq6241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 12/05/2024] [Indexed: 02/09/2025]
Abstract
Measures to combat the parasites that cause malaria have become compromised because of reliance on a small arsenal of drugs and emerging drug resistance. We conducted a transposon mutagenesis screen in the primate malaria parasite Plasmodium knowlesi, producing the most complete classification of gene essentiality in any Plasmodium spp. to date, with the resolution to define truncatable genes. We found conservation in the druggable genome between Plasmodium spp. and divergences in mitochondrial metabolism. Perturbation analyses with the frontline antimalarial artemisinin revealed modulators that both increase and decrease drug susceptibility. Our findings aid prioritization of drug and vaccine targets for the Plasmodium vivax clade and reveal mechanisms of resistance that can inform therapeutic development.
Collapse
Affiliation(s)
- Brendan Elsworth
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health: Boston, MA
- Laboratory of Emerging Pathogens, Division of Emerging and Transfusion Transmitted Diseases, Office of Blood Research and Review, Center for Biologics Evaluation and Research, Food and Drug Administration: Silver Spring, MD, USA
| | - Sida Ye
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health: Boston, MA
- Department of Mathematics, University of Massachusetts Boston: Boston, MA, USA
- Center for Personalized Cancer Therapy, University of Massachusetts Boston: Boston, MA, USA
| | - Sheena Dass
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health: Boston, MA
| | - Jacob A. Tennessen
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health: Boston, MA
| | - Qudseen Sultana
- Center for Personalized Cancer Therapy, University of Massachusetts Boston: Boston, MA, USA
- Department of Computer Science, University of Massachusetts Boston: Boston, MA, USA
| | - Basil T. Thommen
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health: Boston, MA
| | - Aditya S. Paul
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health: Boston, MA
| | - Usheer Kanjee
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health: Boston, MA
| | - Christof Grüring
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health: Boston, MA
| | - Marcelo U. Ferreira
- Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo: São Paulo, Brazil
- Global Health and Tropical Medicine, Institute of Hygiene and Tropical Medicine, Nova University of Lisbon: Lisbon, Portugal
| | | | - Kourosh Zarringhalam
- Department of Mathematics, University of Massachusetts Boston: Boston, MA, USA
- Center for Personalized Cancer Therapy, University of Massachusetts Boston: Boston, MA, USA
| | - Manoj T. Duraisingh
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health: Boston, MA
| |
Collapse
|
2
|
Hsu FM, Horton P. MethylSeqLogo: DNA methylation smart sequence logos. BMC Bioinformatics 2024; 25:326. [PMID: 39385066 PMCID: PMC11462690 DOI: 10.1186/s12859-024-05896-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 08/08/2024] [Indexed: 10/11/2024] Open
Abstract
BACKGROUND Some transcription factors, MYC for example, bind sites of potentially methylated DNA. This may increase binding specificity as such sites are (1) highly under-represented in the genome, and (2) offer additional, tissue specific information in the form of hypo- or hyper-methylation. Fortunately, bisulfite sequencing data can be used to investigate this phenomenon. METHOD We developed MethylSeqLogo, an extension of sequence logos which includes new elements to indicate DNA methylation and under-represented dimers in each position of a set binding sites. Our method displays information from both DNA strands, and takes into account the sequence context (CpG or other) and genome region (promoter versus whole genome) appropriate to properly assess the expected background dimer frequency and level of methylation. MethylSeqLogo preserves sequence logo semantics-the relative height of nucleotides within a column represents their proportion in the binding sites, while the absolute height of each column represents information (relative entropy) and the height of all columns added together represents total information RESULTS: We present figures illustrating the utility of using MethylSeqLogo to summarize data from several CpG binding transcription factors. The logos show that unmethylated CpG binding sites are a feature of transcription factors such as MYC and ZBTB33, while some other CpG binding transcription factors, such as CEBPB, appear methylation neutral. CONCLUSIONS Our software enables users to explore bisulfite and ChIP sequencing data sets-and in the process obtain publication quality figures.
Collapse
Affiliation(s)
- Fei-Man Hsu
- Department of Molecular Cell and Developmental Biology, University of California, Los Angeles, USA
| | - Paul Horton
- Department of Computer Science and Information Engineering, National Cheng Kung University, 1 University Road, Tainan, 70101, Taiwan.
| |
Collapse
|
3
|
Raditsa V, Tsukanov A, Bogomolov A, Levitsky V. Genomic background sequences systematically outperform synthetic ones in de novo motif discovery for ChIP-seq data. NAR Genom Bioinform 2024; 6:lqae090. [PMID: 39071850 PMCID: PMC11282361 DOI: 10.1093/nargab/lqae090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 06/03/2024] [Accepted: 07/19/2024] [Indexed: 07/30/2024] Open
Abstract
Efficient de novo motif discovery from the results of wide-genome mapping of transcription factor binding sites (ChIP-seq) is dependent on the choice of background nucleotide sequences. The foreground sequences (ChIP-seq peaks) represent not only specific motifs of target transcription factors, but also the motifs overrepresented throughout the genome, such as simple sequence repeats. We performed a massive comparison of the 'synthetic' and 'genomic' approaches to generate background sequences for de novo motif discovery. The 'synthetic' approach shuffled nucleotides in peaks, while in the 'genomic' approach selected sequences from the reference genome randomly or only from gene promoters according to the fraction of A/T nucleotides in each sequence. We compiled the benchmark collections of ChIP-seq datasets for mouse, human and Arabidopsis, and performed de novo motif discovery. We showed that the genomic approach has both more robust detection of the known motifs of target transcription factors and more stringent exclusion of the simple sequence repeats as possible non-specific motifs. The advantage of the genomic approach over the synthetic approach was greater in plants compared to mammals. We developed the AntiNoise web service (https://denovosea.icgbio.ru/antinoise/) that implements a genomic approach to extract genomic background sequences for twelve eukaryotic genomes.
Collapse
Affiliation(s)
- Vladimir V Raditsa
- Department of System Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
| | - Anton V Tsukanov
- Department of System Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
| | - Anton G Bogomolov
- Department of Cell Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
| | - Victor G Levitsky
- Department of System Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
- Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia
| |
Collapse
|
4
|
Lou J, Rezvani Y, Arriojas A, Wu Y, Shankar N, Degras D, Keroack CD, Duraisingh MT, Zarringhalam K, Gubbels MJ. Single cell expression and chromatin accessibility of the Toxoplasma gondii lytic cycle identifies AP2XII-8 as an essential ribosome regulon driver. Nat Commun 2024; 15:7419. [PMID: 39198388 PMCID: PMC11358496 DOI: 10.1038/s41467-024-51011-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 07/22/2024] [Indexed: 09/01/2024] Open
Abstract
Sequential lytic cycles driven by cascading transcriptional waves underlie pathogenesis in the apicomplexan parasite Toxoplasma gondii. This parasite's unique division by internal budding, short cell cycle, and jumbled up classically defined cell cycle stages have restrained in-depth transcriptional program analysis. Here, unbiased transcriptome and chromatin accessibility maps throughout the lytic cell cycle are established at the single-cell level. Correlated pseudo-timeline assemblies of expression and chromatin profiles maps transcriptional versus chromatin level transition points promoting the cell division cycle. Sequential clustering analysis identifies functionally related gene groups promoting cell cycle progression. Promoter DNA motif mapping reveals patterns of combinatorial regulation. Pseudo-time trajectory analysis reveals transcriptional bursts at different cell cycle points. The dominant burst in G1 is driven largely by transcription factor AP2XII-8, which engages a conserved DNA motif, and promotes the expression of 44 ribosomal proteins encoding regulon. Overall, the study provides integrated, multi-level insights into apicomplexan transcriptional regulation.
Collapse
Affiliation(s)
- Jingjing Lou
- Department of Biology, Boston College, Chestnut Hill, MA, USA
| | - Yasaman Rezvani
- Department of Mathematics, University of Massachusetts Boston, Boston, MA, USA
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA, USA
| | - Argenis Arriojas
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA, USA
| | - Yihan Wu
- Department of Biology, Boston College, Chestnut Hill, MA, USA
| | - Nachiket Shankar
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA, USA
| | - David Degras
- Department of Mathematics, University of Massachusetts Boston, Boston, MA, USA
| | - Caroline D Keroack
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA, USA
- Department of Molecular Microbiology and Immunology, Brown University, Providence, RI, USA
| | - Manoj T Duraisingh
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Kourosh Zarringhalam
- Department of Mathematics, University of Massachusetts Boston, Boston, MA, USA.
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA, USA.
| | | |
Collapse
|
5
|
Li Y, Wang Y, Wang C, Ma A, Ma Q, Liu B. A weighted two-stage sequence alignment framework to identify motifs from ChIP-exo data. PATTERNS (NEW YORK, N.Y.) 2024; 5:100927. [PMID: 38487805 PMCID: PMC10935504 DOI: 10.1016/j.patter.2024.100927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/18/2023] [Accepted: 01/10/2024] [Indexed: 03/17/2024]
Abstract
In this study, we introduce TESA (weighted two-stage alignment), an innovative motif prediction tool that refines the identification of DNA-binding protein motifs, essential for deciphering transcriptional regulatory mechanisms. Unlike traditional algorithms that rely solely on sequence data, TESA integrates the high-resolution chromatin immunoprecipitation (ChIP) signal, specifically from ChIP-exonuclease (ChIP-exo), by assigning weights to sequence positions, thereby enhancing motif discovery. TESA employs a nuanced approach combining a binomial distribution model with a graph model, further supported by a "bookend" model, to improve the accuracy of predicting motifs of varying lengths. Our evaluation, utilizing an extensive compilation of 90 prokaryotic ChIP-exo datasets from proChIPdb and 167 H. sapiens datasets, compared TESA's performance against seven established tools. The results indicate TESA's improved precision in motif identification, suggesting its valuable contribution to the field of genomic research.
Collapse
Affiliation(s)
- Yang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Yizhong Wang
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| | - Cankun Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| |
Collapse
|
6
|
Xu J, Gao J, Ni P, Gerstein M. Less-is-more: selecting transcription factor binding regions informative for motif inference. Nucleic Acids Res 2024; 52:e20. [PMID: 38214231 PMCID: PMC10899791 DOI: 10.1093/nar/gkad1240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 12/06/2023] [Accepted: 12/17/2023] [Indexed: 01/13/2024] Open
Abstract
Numerous statistical methods have emerged for inferring DNA motifs for transcription factors (TFs) from genomic regions. However, the process of selecting informative regions for motif inference remains understudied. Current approaches select regions with strong ChIP-seq signal for a given TF, assuming that such strong signal primarily results from specific interactions between the TF and its motif. Additionally, these selection approaches do not account for non-target motifs, i.e. motifs of other TFs; they presume the occurrence of these non-target motifs infrequent compared to that of the target motif, and thus assume these have minimal interference with the identification of the target. Leveraging extensive ChIP-seq datasets, we introduced the concept of TF signal 'crowdedness', referred to as C-score, for each genomic region. The C-score helps in highlighting TF signals arising from non-specific interactions. Moreover, by considering the C-score (and adjusting for the length of genomic regions), we can effectively mitigate interference of non-target motifs. Using these tools, we find that in many instances, strong ChIP-seq signal stems mainly from non-specific interactions, and the occurrence of non-target motifs significantly impacts the accurate inference of the target motif. Prioritizing genomic regions with reduced crowdedness and short length markedly improves motif inference. This 'less-is-more' effect suggests that ChIP-seq region selection warrants more attention.
Collapse
Affiliation(s)
- Jinrui Xu
- Department of Biology, Howard University, Washington, DC 20059, USA
- Center for Applied Data Science and Analytics, Howard University, Washington, DC 20059, USA
| | - Jiahao Gao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Pengyu Ni
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
7
|
Murgo E, De Santis E, Sansico F, Melocchi V, Colangelo T, Padovano C, Colucci M, Carbone A, Totti B, Basti A, Gottschlich L, Relogio A, Capitanio N, Bianchi F, Mazzoccoli G, Giambra V. The circadian clock circuitry modulates leukemia initiating cell activity in T-cell acute lymphoblastic leukemia. J Exp Clin Cancer Res 2023; 42:218. [PMID: 37620852 PMCID: PMC10464343 DOI: 10.1186/s13046-023-02799-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 08/14/2023] [Indexed: 08/26/2023] Open
Abstract
BACKGROUND T-cell acute lymphoblastic leukemia (T-ALL) is an aggressive hematological malignancy, characterized by restricted cellular subsets with asymmetrically enriched leukemia initiating cell (LIC) activity. Nonetheless, it is still unclear which signaling programs promote LIC maintenance and progression. METHODS Here, we evaluated the role of the biological clock in the regulation of the molecular mechanisms and signaling pathways impacting the cellular dynamics in T-ALL through an integrated experimental approach including gene expression profiling of shRNA-modified T-ALL cell lines and Chromatin Immunoprecipitation Sequencing (ChIP-Seq) of leukemic cells. Patient-derived xenograft (PDXs) cell subsets were also genetically manipulated in order to assess the LIC activity modulated by the loss of biological clock in human T-ALL. RESULTS We report that the disruption of the circadian clock circuitry obtained through shRNA-mediated knockdown of CLOCK and BMAL1 genes negatively impacted the growth in vitro as well as the activity in vivo of LIC derived from PDXs after transplantation into immunodeficient recipient mice. Additionally, gene expression data integrated with ChIP-Seq profiles of leukemic cells revealed that the circadian clock directly promotes the expression of genes, such as IL20RB, crucially involved in JAK/STAT signaling, making the T-ALL cells more responsive to Interleukin 20 (IL20). CONCLUSION Taken together, our data support the concept that the biological clock drives the expression of IL20R prompting JAK/STAT signaling and promoting LIC activity in T-ALL and suggest that the selective targeting of circadian components could be therapeutically relevant for the treatment of T-ALL patients.
Collapse
Affiliation(s)
- Emanuele Murgo
- Department of Medical Sciences, Division of Internal Medicine and Chronobiology Laboratory, Fondazione IRCCS "Casa Sollievo Della Sofferenza", San Giovanni Rotondo, FG, 71013, Italy
| | - Elisabetta De Santis
- Hematopathology Unit, Fondazione IRCCS "Casa Sollievo Della Sofferenza", San Giovanni Rotondo, FG, 71013, Italy
| | - Francesca Sansico
- Hematopathology Unit, Fondazione IRCCS "Casa Sollievo Della Sofferenza", San Giovanni Rotondo, FG, 71013, Italy
| | - Valentina Melocchi
- Cancer Biomarkers Unit, Fondazione IRCCS "Casa Sollievo Della Sofferenza", San Giovanni Rotondo, FG, 71013, Italy
| | - Tommaso Colangelo
- Cancer Biomarkers Unit, Fondazione IRCCS "Casa Sollievo Della Sofferenza", San Giovanni Rotondo, FG, 71013, Italy
| | - Costanzo Padovano
- Hematopathology Unit, Fondazione IRCCS "Casa Sollievo Della Sofferenza", San Giovanni Rotondo, FG, 71013, Italy
| | - Mattia Colucci
- Hematopathology Unit, Fondazione IRCCS "Casa Sollievo Della Sofferenza", San Giovanni Rotondo, FG, 71013, Italy
| | - Annalucia Carbone
- Department of Medical Sciences, Division of Internal Medicine and Chronobiology Laboratory, Fondazione IRCCS "Casa Sollievo Della Sofferenza", San Giovanni Rotondo, FG, 71013, Italy
| | - Beatrice Totti
- Hematopathology Unit, Fondazione IRCCS "Casa Sollievo Della Sofferenza", San Giovanni Rotondo, FG, 71013, Italy
| | - Alireza Basti
- Institute for Systems Medicine, Faculty of Human Medicine, MSH Medical School Hamburg, Hamburg, 20457, Germany
- Present Address: Ivana Türbachova Laboratory for Epigenetics, Epiontis, Precision for Medicine GmbH, Berlin, Germany
| | - Lisa Gottschlich
- Institute for Systems Medicine, Faculty of Human Medicine, MSH Medical School Hamburg, Hamburg, 20457, Germany
| | - Angela Relogio
- Institute for Systems Medicine, Faculty of Human Medicine, MSH Medical School Hamburg, Hamburg, 20457, Germany
- Molekulares Krebsforschungszentrum (MKFZ), Charité-Universitätsmedizin Berlin, Berlin, Germany
- Institute for Theoretical Biology (ITB), Charité-Universitätsmedizin Berlin and Humboldt-Universität Zu Berlin, Berlin, Germany
| | - Nazzareno Capitanio
- Department of Clinical and Experimental Medicine, University of Foggia, Foggia, Italy
| | - Fabrizio Bianchi
- Cancer Biomarkers Unit, Fondazione IRCCS "Casa Sollievo Della Sofferenza", San Giovanni Rotondo, FG, 71013, Italy
| | - Gianluigi Mazzoccoli
- Department of Medical Sciences, Division of Internal Medicine and Chronobiology Laboratory, Fondazione IRCCS "Casa Sollievo Della Sofferenza", San Giovanni Rotondo, FG, 71013, Italy.
| | - Vincenzo Giambra
- Hematopathology Unit, Fondazione IRCCS "Casa Sollievo Della Sofferenza", San Giovanni Rotondo, FG, 71013, Italy.
| |
Collapse
|
8
|
Alatawneh R, Salomon Y, Eshel R, Orenstein Y, Birnbaum RY. Deciphering transcription factors and their corresponding regulatory elements during inhibitory interneuron differentiation using deep neural networks. Front Cell Dev Biol 2023; 11:1034604. [PMID: 36891511 PMCID: PMC9986276 DOI: 10.3389/fcell.2023.1034604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 01/23/2023] [Indexed: 02/22/2023] Open
Abstract
During neurogenesis, the generation and differentiation of neuronal progenitors into inhibitory gamma-aminobutyric acid-containing interneurons is dependent on the combinatorial activity of transcription factors (TFs) and their corresponding regulatory elements (REs). However, the roles of neuronal TFs and their target REs in inhibitory interneuron progenitors are not fully elucidated. Here, we developed a deep-learning-based framework to identify enriched TF motifs in gene REs (eMotif-RE), such as poised/repressed enhancers and putative silencers. Using epigenetic datasets (e.g., ATAC-seq and H3K27ac/me3 ChIP-seq) from cultured interneuron-like progenitors, we distinguished between active enhancer sequences (open chromatin with H3K27ac) and non-active enhancer sequences (open chromatin without H3K27ac). Using our eMotif-RE framework, we discovered enriched motifs of TFs such as ASCL1, SOX4, and SOX11 in the active enhancer set suggesting a cooperativity function for ASCL1 and SOX4/11 in active enhancers of neuronal progenitors. In addition, we found enriched ZEB1 and CTCF motifs in the non-active set. Using an in vivo enhancer assay, we showed that most of the tested putative REs from the non-active enhancer set have no enhancer activity. Two of the eight REs (25%) showed function as poised enhancers in the neuronal system. Moreover, mutated REs for ZEB1 and CTCF motifs increased their in vivo activity as enhancers indicating a repressive effect of ZEB1 and CTCF on these REs that likely function as repressed enhancers or silencers. Overall, our work integrates a novel framework based on deep learning together with a functional assay that elucidated novel functions of TFs and their corresponding REs. Our approach can be applied to better understand gene regulation not only in inhibitory interneuron differentiation but in other tissue and cell types.
Collapse
Affiliation(s)
- Rawan Alatawneh
- Department of Life Sciences, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,The Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Yahel Salomon
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Reut Eshel
- Department of Life Sciences, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,The Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Yaron Orenstein
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel.,The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel
| | - Ramon Y Birnbaum
- Department of Life Sciences, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,The Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| |
Collapse
|
9
|
Motif and conserved module analysis in DNA (promoters, enhancers) and RNA (lncRNA, mRNA) using AlModules. Sci Rep 2022; 12:17588. [PMID: 36266399 PMCID: PMC9584888 DOI: 10.1038/s41598-022-21732-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 09/30/2022] [Indexed: 01/13/2023] Open
Abstract
Nucleic acid motifs consist of conserved and variable nucleotide regions. For functional action, several motifs are combined to modules. The tool AIModules allows identification of such motifs including combinations of them and conservation in several nucleic acid stretches. AIModules recognizes conserved motifs and combinations of motifs (modules) allowing a number of interesting biological applications such as analysis of promoter and transcription factor binding sites (TFBS), identification of conserved modules shared between several gene families, e.g. promoter regions, but also analysis of shared and conserved other DNA motifs such as enhancers and silencers, in mRNA (motifs or regulatory elements e.g. for polyadenylation) and lncRNAs. The tool AIModules presented here is an integrated solution for motif analysis, offered as a Web service as well as downloadable software. Several nucleotide sequences are queried for TFBSs using predefined matrices from the JASPAR DB or by using one's own matrices for diverse types of DNA or RNA motif discovery. Furthermore, AIModules can find TFBSs common to two or more sequences. Demanding high or low conservation, AIModules outperforms other solutions in speed and finds more modules (specific combinations of TFBS) than alternative available software. The application also searches RNA motifs such as polyadenylation site or RNA-protein binding motifs as well as DNA motifs such as enhancers as well as user-specified motif combinations ( https://bioinfo-wuerz.de/aimodules/ ; alternative entry pages: https://aimodules.heinzelab.de or https://www.biozentrum.uni-wuerzburg.de/bioinfo/computing/aimodules ). The application is free and open source whether used online, on-site, or locally.
Collapse
|
10
|
Pelinski Y, Hidaoui D, Stolz A, Hermetet F, Chelbi R, Diop MK, Chioukh AM, Porteu F, Elvira-Matelot E. NF-κB signaling controls H3K9me3 levels at intronic LINE-1 and hematopoietic stem cell genes in cis. J Exp Med 2022; 219:213343. [PMID: 35802137 PMCID: PMC9274146 DOI: 10.1084/jem.20211356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 12/23/2021] [Accepted: 06/21/2022] [Indexed: 01/11/2023] Open
Abstract
Ionizing radiations (IR) alter hematopoietic stem cell (HSC) function on the long term, but the mechanisms underlying these effects are still poorly understood. We recently showed that IR induces the derepression of L1Md, the mouse young subfamilies of LINE-1/L1 retroelements. L1 contributes to gene regulatory networks. However, how L1Md are derepressed and impact HSC gene expression are not known. Here, we show that IR triggers genome-wide H3K9me3 decrease that occurs mainly at L1Md. Loss of H3K9me3 at intronic L1Md harboring NF-κB binding sites motifs but not at promoters is associated with the repression of HSC-specific genes. This is correlated with reduced NFKB1 repressor expression. TNF-α treatment rescued all these effects and prevented IR-induced HSC loss of function in vivo. This TNF-α/NF-κB/H3K9me3/L1Md axis might be important to maintain HSCs while allowing expression of immune genes during myeloid regeneration or damage-induced bone marrow ablation.
Collapse
Affiliation(s)
- Yanis Pelinski
- INSERM UMR1287, Gustave Roussy, Villejuif, France,Université Paris-Saclay, Gif-sur-Yvette, France
| | - Donia Hidaoui
- INSERM UMR1287, Gustave Roussy, Villejuif, France,Université Paris-Saclay, Gif-sur-Yvette, France
| | - Anne Stolz
- INSERM UMR1287, Gustave Roussy, Villejuif, France,Université Paris-Saclay, Gif-sur-Yvette, France
| | - François Hermetet
- INSERM UMR1287, Gustave Roussy, Villejuif, France,Université Paris-Saclay, Gif-sur-Yvette, France
| | - Rabie Chelbi
- INSERM UMR1287, Gustave Roussy, Villejuif, France,Université Paris-Saclay, Gif-sur-Yvette, France
| | - M’boyba Khadija Diop
- Université Paris-Saclay, Gif-sur-Yvette, France,Bioinformatics Platform UMS AMMICa INSERM US23/CNRS 3655, Gustave Roussy, Villejuif, France
| | - Amir M. Chioukh
- INSERM UMR1287, Gustave Roussy, Villejuif, France,Université Paris-Saclay, Gif-sur-Yvette, France
| | - Françoise Porteu
- INSERM UMR1287, Gustave Roussy, Villejuif, France,Université Paris-Saclay, Gif-sur-Yvette, France
| | - Emilie Elvira-Matelot
- INSERM UMR1287, Gustave Roussy, Villejuif, France,Université Paris-Saclay, Gif-sur-Yvette, France
| |
Collapse
|
11
|
Meng Y, Wang G, He H, Lau KH, Hurt A, Bixler BJ, Parham A, Jin SG, Xu X, Vasquez KM, Pfeifer GP, Szabó PE. Z-DNA is remodelled by ZBTB43 in prospermatogonia to safeguard the germline genome and epigenome. Nat Cell Biol 2022; 24:1141-1153. [PMID: 35787683 PMCID: PMC9276527 DOI: 10.1038/s41556-022-00941-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Accepted: 05/17/2022] [Indexed: 12/12/2022]
Abstract
Mutagenic purine–pyrimidine repeats can adopt the left-handed Z-DNA conformation. DNA breaks at potential Z-DNA sites can lead to somatic mutations in cancer or to germline mutations that are transmitted to the next generation. It is not known whether any mechanism exists in the germ line to control Z-DNA structure and DNA breaks at purine–pyrimidine repeats. Here we provide genetic, epigenomic and biochemical evidence for the existence of a biological process that erases Z-DNA specifically in germ cells of the mouse male foetus. We show that a previously uncharacterized zinc finger protein, ZBTB43, binds to and removes Z-DNA, preventing the formation of DNA double-strand breaks. By removing Z-DNA, ZBTB43 also promotes de novo DNA methylation at CG-containing purine–pyrimidine repeats in prospermatogonia. Therefore, the genomic and epigenomic integrity of the species is safeguarded by remodelling DNA structure in the mammalian germ line during a critical window of germline epigenome reprogramming. Meng et al. show that ZBTB43 alters Z-DNA structures to prevent deleterious double-strand breaks and promote DNA methylation at purine–pyrimidine repeats in the mouse germ line.
Collapse
Affiliation(s)
- Yingying Meng
- Capital Normal University College of Life Science, Beijing, China.,Department of Epigenetics, Van Andel Institute, Grand Rapids, MI, USA
| | - Guliang Wang
- Division of Pharmacology and Toxicology, College of Pharmacy, The University of Texas at Austin, Austin, TX, USA
| | - Hongjuan He
- Department of Epigenetics, Van Andel Institute, Grand Rapids, MI, USA.,School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Kin H Lau
- Bioinformatics and Biostatistics Core, Van Andel Institute, Grand Rapids, MI, USA
| | - Allison Hurt
- Department of Epigenetics, Van Andel Institute, Grand Rapids, MI, USA
| | - Brianna J Bixler
- Department of Epigenetics, Van Andel Institute, Grand Rapids, MI, USA
| | - Andrea Parham
- Department of Epigenetics, Van Andel Institute, Grand Rapids, MI, USA.,Van Andel Institute Graduate School, Grand Rapids, MI, USA
| | - Seung-Gi Jin
- Department of Epigenetics, Van Andel Institute, Grand Rapids, MI, USA
| | - Xingzhi Xu
- Capital Normal University College of Life Science, Beijing, China.,Guangdong Key Laboratory for Genome Stability & Disease Prevention and Carson International Cancer Center, Marshall Laboratory of Biomedical Engineering, Shenzhen University School of Medicine, Shenzhen, China
| | - Karen M Vasquez
- Division of Pharmacology and Toxicology, College of Pharmacy, The University of Texas at Austin, Austin, TX, USA
| | - Gerd P Pfeifer
- Department of Epigenetics, Van Andel Institute, Grand Rapids, MI, USA
| | - Piroska E Szabó
- Department of Epigenetics, Van Andel Institute, Grand Rapids, MI, USA.
| |
Collapse
|
12
|
Redmon IC, Ardizzone M, Hekimoğlu H, Hatfield BM, Waldern JM, Dey A, Montgomery SA, Laederach A, Ramos SBV. Sequence and tissue targeting specificity of ZFP36L2 reveals Elavl2 as a novel target with co-regulation potential. Nucleic Acids Res 2022; 50:4068-4082. [PMID: 35380695 PMCID: PMC9023260 DOI: 10.1093/nar/gkac209] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 03/05/2022] [Accepted: 03/18/2022] [Indexed: 11/12/2022] Open
Abstract
Zinc finger protein 36 like 2 (ZFP36L2) is an RNA-binding protein that destabilizes transcripts containing adenine-uridine rich elements (AREs). The overlap between ZFP36L2 targets in different tissues is minimal, suggesting that ZFP36L2-targeting is highly tissue specific. We developed a novel Zfp36l2-lacking mouse model (L2-fKO) to identify factors governing this tissue specificity. We found 549 upregulated genes in the L2-fKO spleen by RNA-seq. These upregulated genes were enriched in ARE motifs in the 3′UTRs, which suggests that they are ZFP36L2 targets, however the precise sequence requirement for targeting was not evident from motif analysis alone. We therefore used gel-shift mobility assays on 12 novel putative targets and established that ZFP36L2 requires a 7-mer (UAUUUAU) motif to bind. We observed a statistically significant enrichment of 7-mer ARE motifs in upregulated genes and determined that ZFP36L2 targets are enriched for multiple 7-mer motifs. Elavl2 mRNA, which has three 7-mer (UAUUUAU) motifs, was also upregulated in L2-fKO spleens. Overexpression of ZFP36L2, but not a ZFP36L2(C176S) mutant, reduced Elavl2 mRNA expression, suggesting a direct negative effect. Additionally, a reporter assay demonstrated that the ZFP36L2 effect on Elavl2 decay is dependent on the Elavl2-3′UTR and requires the 7-mer AREs. Our data indicate that Elavl2 mRNA is a novel target of ZFP36L2, specific to the spleen. Likely, ZFP36L2 combined with other RNA binding proteins, such as ELAVL2, governs tissue specificity.
Collapse
Affiliation(s)
- Ian C Redmon
- Biochemistry and Biophysics Department, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Matthew Ardizzone
- Biochemistry and Biophysics Department, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Hilal Hekimoğlu
- Biochemistry and Biophysics Department, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Breanne M Hatfield
- Chemistry Department, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Justin M Waldern
- Biology Department, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Abhishek Dey
- Biology Department, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Stephanie A Montgomery
- Department of Pathology and Laboratory Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Alain Laederach
- Biology Department, University of North Carolina, Chapel Hill, NC 27599, USA.,Bioinformatics and Computational Biology Program, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Silvia B V Ramos
- Biochemistry and Biophysics Department, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
13
|
Tsukanov AV, Levitsky VG, Merkulova TI. Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites. Vavilovskii Zhurnal Genet Selektsii 2021; 25:7. [PMID: 34547062 PMCID: PMC8408018 DOI: 10.18699/vj21.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 01/10/2021] [Accepted: 01/12/2021] [Indexed: 11/24/2022] Open
Abstract
The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS)
is the positional weight matrix (PWM). However, this model does not take into account dependencies between
nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe,
can do as much. However, application of these models was usually limited only to comparing their recognition
accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This
pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their
classification based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered
PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a
significant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was
26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of
predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks
containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe,
respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity.
We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq
datasets under study.
Collapse
Affiliation(s)
- A V Tsukanov
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - V G Levitsky
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
| | - T I Merkulova
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
14
|
Prosperi M, Marini S, Boucher C. Fast and exact quantification of motif occurrences in biological sequences. BMC Bioinformatics 2021; 22:445. [PMID: 34537012 PMCID: PMC8449872 DOI: 10.1186/s12859-021-04355-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Accepted: 09/06/2021] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Identification of motifs and quantification of their occurrences are important for the study of genetic diseases, gene evolution, transcription sites, and other biological mechanisms. Exact formulae for estimating count distributions of motifs under Markovian assumptions have high computational complexity and are impractical to be used on large motif sets. Approximated formulae, e.g. based on compound Poisson, are faster, but reliable p value calculation remains challenging. Here, we introduce 'motif_prob', a fast implementation of an exact formula for motif count distribution through progressive approximation with arbitrary precision. Our implementation speeds up the exact calculation, usually impractical, making it feasible and posit to substitute currently employed heuristics. RESULTS We implement motif_prob in both Perl and C+ + languages, using an efficient error-bound iterative process for the exact formula, providing comparison with state-of-the-art tools (e.g. MoSDi) in terms of precision, run time benchmarks, along with a real-world use case on bacterial motif characterization. Our software is able to process a million of motifs (13-31 bases) over genome lengths of 5 million bases within the minute on a regular laptop, and the run times for both the Perl and C+ + code are several orders of magnitude smaller (50-1000× faster) than MoSDi, even when using their fast compound Poisson approximation (60-120× faster). In the real-world use cases, we first show the consistency of motif_prob with MoSDi, and then how the p-value quantification is crucial for enrichment quantification when bacteria have different GC content, using motifs found in antimicrobial resistance genes. The software and the code sources are available under the MIT license at https://github.com/DataIntellSystLab/motif_prob . CONCLUSIONS The motif_prob software is a multi-platform and efficient open source solution for calculating exact frequency distributions of motifs. It can be integrated with motif discovery/characterization tools for quantifying enrichment and deviation from expected frequency ranges with exact p values, without loss in data processing efficiency.
Collapse
Affiliation(s)
- Mattia Prosperi
- Data Intelligence Systems Lab, Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA.
| | - Simone Marini
- Data Intelligence Systems Lab, Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
| |
Collapse
|
15
|
Castellana S, Biagini T, Parca L, Petrizzelli F, Bianco SD, Vescovi AL, Carella M, Mazza T. A comparative benchmark of classic DNA motif discovery tools on synthetic data. Brief Bioinform 2021; 22:6341664. [PMID: 34351399 DOI: 10.1093/bib/bbab303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 07/08/2021] [Accepted: 07/15/2021] [Indexed: 01/01/2023] Open
Abstract
Hundreds of human proteins were found to establish transient interactions with rather degenerated consensus DNA sequences or motifs. Identifying these motifs and the genomic sites where interactions occur represent one of the most challenging research goals in modern molecular biology and bioinformatics. The last twenty years witnessed an explosion of computational tools designed to perform this task, whose performance has been last compared fifteen years ago. Here, we survey sixteen of them, benchmark their ability to identify known motifs nested in twenty-nine simulated sequence datasets, and finally report their strengths, weaknesses, and complementarity.
Collapse
Affiliation(s)
- Stefano Castellana
- Bioinformatics Unit, IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo 71013, Italy
| | - Tommaso Biagini
- Bioinformatics Unit, IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo 71013, Italy
| | - Luca Parca
- Bioinformatics Unit, IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo 71013, Italy
| | - Francesco Petrizzelli
- Bioinformatics Unit, IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo 71013, Italy.,Department of Experimental Medicine, Sapienza University of Rome, Rome 00161, Italy
| | | | - Angelo Luigi Vescovi
- ISBReMIT Institute for Stem Cell Biology, Regenerative Medicine and Innovative Therapies, IRCSS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
| | - Massimo Carella
- Medical Genetics Unit, IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo 71013, Italy
| | - Tommaso Mazza
- Bioinformatics Unit, IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo 71013, Italy
| |
Collapse
|
16
|
Ge W, Meier M, Roth C, Söding J. Bayesian Markov models improve the prediction of binding motifs beyond first order. NAR Genom Bioinform 2021; 3:lqab026. [PMID: 33928244 PMCID: PMC8057495 DOI: 10.1093/nargab/lqab026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 03/11/2021] [Accepted: 03/30/2021] [Indexed: 12/13/2022] Open
Abstract
Transcription factors (TFs) regulate gene expression by binding to specific DNA motifs. Accurate models for predicting binding affinities are crucial for quantitatively understanding of transcriptional regulation. Motifs are commonly described by position weight matrices, which assume that each position contributes independently to the binding energy. Models that can learn dependencies between positions, for instance, induced by DNA structure preferences, have yielded markedly improved predictions for most TFs on in vivo data. However, they are more prone to overfit the data and to learn patterns merely correlated with rather than directly involved in TF binding. We present an improved, faster version of our Bayesian Markov model software, BaMMmotif2. We tested it with state-of-the-art motif discovery tools on a large collection of ChIP-seq and HT-SELEX datasets. BaMMmotif2 models of fifth-order achieved a median false-discovery-rate-averaged recall 13.6% and 12.2% higher than the next best tool on 427 ChIP-seq datasets and 164 HT-SELEX datasets, respectively, while being 8 to 1000 times faster. BaMMmotif2 models showed no signs of overtraining in cross-cell line and cross-platform tests, with similar improvements on the next-best tool. These results demonstrate that dependencies beyond first order clearly improve binding models for most TFs.
Collapse
Affiliation(s)
- Wanwan Ge
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Markus Meier
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Christian Roth
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| |
Collapse
|
17
|
Koulouras G, Frith MC. Significant non-existence of sequences in genomes and proteomes. Nucleic Acids Res 2021; 49:3139-3155. [PMID: 33693858 PMCID: PMC8034619 DOI: 10.1093/nar/gkab139] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 02/11/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022] Open
Abstract
Minimal absent words (MAWs) are minimal-length oligomers absent from a genome or proteome. Although some artificially synthesized MAWs have deleterious effects, there is still a lack of a strategy for the classification of non-occurring sequences as potentially malicious or benign. In this work, by using Markovian models with multiple-testing correction, we reveal significant absent oligomers, which are statistically expected to exist. This suggests that their absence is due to negative selection. We survey genomes and proteomes covering the diversity of life and find thousands of significant absent sequences. Common significant MAWs are often mono- or dinucleotide tracts, or palindromic. Significant viral MAWs are often restriction sites and may indicate unknown restriction motifs. Surprisingly, significant mammal genome MAWs are often present, but rare, in other mammals, suggesting that they are suppressed but not completely forbidden. Significant human MAWs are frequently present in prokaryotes, suggesting immune function, but rarely present in human viruses, indicating viral mimicry of the host. More than one-fourth of human proteins are one substitution away from containing a significant MAW, with the majority of replacements being predicted harmful. We provide a web-based, interactive database of significant MAWs across genomes and proteomes.
Collapse
Affiliation(s)
- Grigorios Koulouras
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Shinjuku-ku, Tokyo, Japan
| |
Collapse
|
18
|
Jenjaroenpun P, Wongsurawat T, Wadley TD, Wassenaar TM, Liu J, Dai Q, Wanchai V, Akel NS, Jamshidi-Parsian A, Franco AT, Boysen G, Jennings ML, Ussery DW, He C, Nookaew I. Decoding the epitranscriptional landscape from native RNA sequences. Nucleic Acids Res 2021; 49:e7. [PMID: 32710622 PMCID: PMC7826254 DOI: 10.1093/nar/gkaa620] [Citation(s) in RCA: 171] [Impact Index Per Article: 42.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 06/13/2020] [Accepted: 07/13/2020] [Indexed: 11/14/2022] Open
Abstract
Traditional epitranscriptomics relies on capturing a single RNA modification by antibody or chemical treatment, combined with short-read sequencing to identify its transcriptomic location. This approach is labor-intensive and may introduce experimental artifacts. Direct sequencing of native RNA using Oxford Nanopore Technologies (ONT) can allow for directly detecting the RNA base modifications, although these modifications might appear as sequencing errors. The percent Error of Specific Bases (%ESB) was higher for native RNA than unmodified RNA, which enabled the detection of ribonucleotide modification sites. Based on the %ESB differences, we developed a bioinformatic tool, epitranscriptional landscape inferring from glitches of ONT signals (ELIGOS), that is based on various types of synthetic modified RNA and applied to rRNA and mRNA. ELIGOS is able to accurately predict known classes of RNA methylation sites (AUC > 0.93) in rRNAs from Escherichiacoli, yeast, and human cells, using either unmodified in vitro transcription RNA or a background error model, which mimics the systematic error of direct RNA sequencing as the reference. The well-known DRACH/RRACH motif was localized and identified, consistent with previous studies, using differential analysis of ELIGOS to study the impact of RNA m6A methyltransferase by comparing wild type and knockouts in yeast and mouse cells. Lastly, the DRACH motif could also be identified in the mRNA of three human cell lines. The mRNA modification identified by ELIGOS is at the level of individual base resolution. In summary, we have developed a bioinformatic software package to uncover native RNA modifications.
Collapse
Affiliation(s)
- Piroon Jenjaroenpun
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Thidathip Wongsurawat
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Taylor D Wadley
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Trudy M Wassenaar
- Molecular Microbiology and Genomics Consultants, Zotzenheim, Germany
| | - Jun Liu
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Qing Dai
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Visanu Wanchai
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Nisreen S Akel
- Department of Physiology and Biophysics, College of Medicine, The University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Azemat Jamshidi-Parsian
- Department of Radiation Oncology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Aime T Franco
- Department of Physiology and Biophysics, College of Medicine, The University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Gunnar Boysen
- Department of Environmental and Occupational Health, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Michael L Jennings
- Department of Physiology and Biophysics, College of Medicine, The University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - David W Ussery
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Chuan He
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Intawat Nookaew
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA.,Department of Physiology and Biophysics, College of Medicine, The University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| |
Collapse
|
19
|
Jenjaroenpun P, Wongsurawat T, Wadley TD, Wassenaar TM, Liu J, Dai Q, Wanchai V, Akel NS, Jamshidi-Parsian A, Franco AT, Boysen G, Jennings ML, Ussery DW, He C, Nookaew I. Decoding the epitranscriptional landscape from native RNA sequences. Nucleic Acids Res 2021; 49:e7. [PMID: 32710622 DOI: 10.1101/487819] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 06/13/2020] [Accepted: 07/13/2020] [Indexed: 05/25/2023] Open
Abstract
Traditional epitranscriptomics relies on capturing a single RNA modification by antibody or chemical treatment, combined with short-read sequencing to identify its transcriptomic location. This approach is labor-intensive and may introduce experimental artifacts. Direct sequencing of native RNA using Oxford Nanopore Technologies (ONT) can allow for directly detecting the RNA base modifications, although these modifications might appear as sequencing errors. The percent Error of Specific Bases (%ESB) was higher for native RNA than unmodified RNA, which enabled the detection of ribonucleotide modification sites. Based on the %ESB differences, we developed a bioinformatic tool, epitranscriptional landscape inferring from glitches of ONT signals (ELIGOS), that is based on various types of synthetic modified RNA and applied to rRNA and mRNA. ELIGOS is able to accurately predict known classes of RNA methylation sites (AUC > 0.93) in rRNAs from Escherichiacoli, yeast, and human cells, using either unmodified in vitro transcription RNA or a background error model, which mimics the systematic error of direct RNA sequencing as the reference. The well-known DRACH/RRACH motif was localized and identified, consistent with previous studies, using differential analysis of ELIGOS to study the impact of RNA m6A methyltransferase by comparing wild type and knockouts in yeast and mouse cells. Lastly, the DRACH motif could also be identified in the mRNA of three human cell lines. The mRNA modification identified by ELIGOS is at the level of individual base resolution. In summary, we have developed a bioinformatic software package to uncover native RNA modifications.
Collapse
Affiliation(s)
- Piroon Jenjaroenpun
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Thidathip Wongsurawat
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Taylor D Wadley
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Trudy M Wassenaar
- Molecular Microbiology and Genomics Consultants, Zotzenheim, Germany
| | - Jun Liu
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Qing Dai
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Visanu Wanchai
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Nisreen S Akel
- Department of Physiology and Biophysics, College of Medicine, The University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Azemat Jamshidi-Parsian
- Department of Radiation Oncology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Aime T Franco
- Department of Physiology and Biophysics, College of Medicine, The University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Gunnar Boysen
- Department of Environmental and Occupational Health, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Michael L Jennings
- Department of Physiology and Biophysics, College of Medicine, The University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - David W Ussery
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Chuan He
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Howard Hughes Medical Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Intawat Nookaew
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
- Department of Physiology and Biophysics, College of Medicine, The University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| |
Collapse
|
20
|
Kolmykov S, Yevshin I, Kulyashov M, Sharipov R, Kondrakhin Y, Makeev VJ, Kulakovskiy IV, Kel A, Kolpakov F. GTRD: an integrated view of transcription regulation. Nucleic Acids Res 2021; 49:D104-D111. [PMID: 33231677 PMCID: PMC7778956 DOI: 10.1093/nar/gkaa1057] [Citation(s) in RCA: 168] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/18/2020] [Accepted: 11/03/2020] [Indexed: 12/24/2022] Open
Abstract
The Gene Transcription Regulation Database (GTRD; http://gtrd.biouml.org/) contains uniformly annotated and processed NGS data related to gene transcription regulation: ChIP-seq, ChIP-exo, DNase-seq, MNase-seq, ATAC-seq and RNA-seq. With the latest release, the database has reached a new level of data integration. All cell types (cell lines and tissues) presented in the GTRD were arranged into a dictionary and linked with different ontologies (BRENDA, Cell Ontology, Uberon, Cellosaurus and Experimental Factor Ontology) and with related experiments in specialized databases on transcription regulation (FANTOM5, ENCODE and GTEx). The updated version of the GTRD provides an integrated view of transcription regulation through a dedicated web interface with advanced browsing and search capabilities, an integrated genome browser, and table reports by cell types, transcription factors, and genes of interest.
Collapse
Affiliation(s)
- Semyon Kolmykov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
- Federal Research Center Institute of Cytology and Genetics SB RAS, Novosibirsk 630090, Russian Federation
| | - Ivan Yevshin
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
| | - Mikhail Kulyashov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
- Novosibirsk State University, Novosibirsk 630090, Russian Federation
| | - Ruslan Sharipov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
- Novosibirsk State University, Novosibirsk 630090, Russian Federation
| | - Yury Kondrakhin
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics RAS, Moscow 119991, Russian Federation
- Moscow Institute of Physics and Technology (State University), Dolgoprudny 141700, Russian Federation
- NRC «Kurchatov Institute» - GOSNIIGENETIKA, Kurchatov Genomic Center, Moscow 123182, Russian Federation
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russian Federation
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics RAS, Moscow 119991, Russian Federation
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russian Federation
- Institute of Protein Research, Russian Academy of Sciences, Pushchino 142290, Russian Federation
| | - Alexander Kel
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- geneXplain GmbH, 38302 Wolfenbüttel, Germany
- Institute of Chemical Biology and Fundamental Medicine SB RAS, Novosibirsk 630090, Russian Federation
| | - Fedor Kolpakov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
| |
Collapse
|
21
|
Web-Based Bioinformatics Approach Towards Analysis of Regulatory Sequences. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
22
|
Zhang J, Chen Q, Liu B. iDRBP_MMC: Identifying DNA-Binding Proteins and RNA-Binding Proteins Based on Multi-Label Learning Model and Motif-Based Convolutional Neural Network. J Mol Biol 2020; 432:5860-5875. [DOI: 10.1016/j.jmb.2020.09.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 08/12/2020] [Accepted: 09/04/2020] [Indexed: 11/28/2022]
|
23
|
Levitsky V, Zemlyanskaya E, Oshchepkov D, Podkolodnaya O, Ignatieva E, Grosse I, Mironova V, Merkulova T. A single ChIP-seq dataset is sufficient for comprehensive analysis of motifs co-occurrence with MCOT package. Nucleic Acids Res 2020; 47:e139. [PMID: 31750523 PMCID: PMC6868382 DOI: 10.1093/nar/gkz800] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 08/12/2019] [Accepted: 09/09/2019] [Indexed: 01/20/2023] Open
Abstract
Recognition of composite elements consisting of two transcription factor binding sites gets behind the studies of tissue-, stage- and condition-specific transcription. Genome-wide data on transcription factor binding generated with ChIP-seq method facilitate an identification of composite elements, but the existing bioinformatics tools either require ChIP-seq datasets for both partner transcription factors, or omit composite elements with motifs overlapping. Here we present an universal Motifs Co-Occurrence Tool (MCOT) that retrieves maximum information about overrepresented composite elements from a single ChIP-seq dataset. This includes homo- and heterotypic composite elements of four mutual orientations of motifs, separated with a spacer or overlapping, even if recognition of motifs within composite element requires various stringencies. Analysis of 52 ChIP-seq datasets for 18 human transcription factors confirmed that for over 60% of analyzed datasets and transcription factors predicted co-occurrence of motifs implied experimentally proven protein-protein interaction of respecting transcription factors. Analysis of 164 ChIP-seq datasets for 57 mammalian transcription factors showed that abundance of predicted composite elements with an overlap of motifs compared to those with a spacer more than doubled; and they had 1.5-fold increase of asymmetrical pairs of motifs with one more conservative 'leading' motif and another one 'guided'.
Collapse
Affiliation(s)
- Victor Levitsky
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia.,Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Elena Zemlyanskaya
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia.,Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Dmitry Oshchepkov
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
| | - Olga Podkolodnaya
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
| | - Elena Ignatieva
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia.,Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Ivo Grosse
- Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia.,Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany.,German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Leipzig, Germany
| | - Victoria Mironova
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia.,Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Tatyana Merkulova
- Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia.,Department of Molecular Genetics, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
| |
Collapse
|
24
|
Yevshin I, Sharipov R, Kolmykov S, Kondrakhin Y, Kolpakov F. GTRD: a database on gene transcription regulation-2019 update. Nucleic Acids Res 2020; 47:D100-D105. [PMID: 30445619 PMCID: PMC6323985 DOI: 10.1093/nar/gky1128] [Citation(s) in RCA: 156] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 10/26/2018] [Indexed: 01/16/2023] Open
Abstract
The current version of the Gene Transcription Regulation Database (GTRD; http://gtrd.biouml.org) contains information about: (i) transcription factor binding sites (TFBSs) and transcription coactivators identified by ChIP-seq experiments for Homo sapiens, Mus musculus, Rattus norvegicus, Danio rerio, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Arabidopsis thaliana; (ii) regions of open chromatin and TFBSs (DNase footprints) identified by DNase-seq; (iii) unmappable regions where TFBSs cannot be identified due to repeats; (iv) potential TFBSs for both human and mouse using position weight matrices from the HOCOMOCO database. Raw ChIP-seq and DNase-seq data were obtained from ENCODE and SRA, and uniformly processed. ChIP-seq peaks were called using four different methods: MACS, SISSRs, GEM and PICS. Moreover, peaks for the same factor and peak calling method, albeit using different experiment conditions (cell line, treatment, etc.), were merged into clusters. To reduce noise, such clusters for different peak calling methods were merged into meta-clusters; these were considered to be non-redundant TFBS sets. Moreover, extended quality control was applied to all ChIP-seq data. Web interface to access GTRD was developed using the BioUML platform. It provides browsing and displaying information, advanced search possibilities and an integrated genome browser.
Collapse
Affiliation(s)
- Ivan Yevshin
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
| | - Ruslan Sharipov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation.,Institute of Computational Technologies SB RAS, Novosibirsk 630090, Russian Federation.,Novosibirsk State University, Novosibirsk 630090, Russian Federation
| | - Semyon Kolmykov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation.,Institute of Cytology and Genetics SB RAS, Novosibirsk 630090, Russian Federation
| | - Yury Kondrakhin
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation.,Institute of Computational Technologies SB RAS, Novosibirsk 630090, Russian Federation
| | - Fedor Kolpakov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation.,Institute of Computational Technologies SB RAS, Novosibirsk 630090, Russian Federation
| |
Collapse
|
25
|
Rodrigues RAL, Louazani AC, Picorelli A, Oliveira GP, Lobo FP, Colson P, La Scola B, Abrahão JS. Analysis of a Marseillevirus Transcriptome Reveals Temporal Gene Expression Profile and Host Transcriptional Shift. Front Microbiol 2020; 11:651. [PMID: 32390970 PMCID: PMC7192143 DOI: 10.3389/fmicb.2020.00651] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 03/22/2020] [Indexed: 12/17/2022] Open
Abstract
Marseilleviruses comprise a family of large double-stranded DNA viruses belonging to the proposed order "Megavirales." These viruses have a circular genome of ∼370 kbp, coding hundreds of genes. Over a half of their genes are associated with AT-rich putative promoter motifs, which have been demonstrated to be important for gene regulation. However, the transcriptional profile of Marseilleviruses is currently unknown. Here we used RNA sequencing technology to get a general transcriptional profile of Marseilleviruses. Eight million 75-bp-long nucleotide sequences were robustly mapped to all 457 genes initially predicted for Marseillevirus isolate T19, the prototype strain of the family, and we were able to assemble 359 viral contigs using a genome-guided approach with stringent parameters. These reads were differentially mapped to the genes according to the replicative cycle time point from which they were obtained. Cluster analysis indicated the existence of three main temporal categories of gene expression, early, intermediate and late, which were validated by quantitative reverse transcription polymerase chain reaction assays targeting several genes. Genes belonging to different functional groups exhibited distinct expression levels throughout the infection cycle. We observed that the previously predicted promoter motif, AAATATTT, as well as new predicted motifs, were not specifically related to any of the temporal or functional classes of genes, suggesting that other components are involved in temporally regulating virus transcription. Moreover, the host transcription machinery is heavily altered, and many genes are down regulated, including those related to translation process. This study provides an overview of the transcriptional landscape of Marseilleviruses.
Collapse
Affiliation(s)
- Rodrigo Araújo Lima Rodrigues
- Laboratório de Vírus, Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
- Microbes, Evolution, Phylogeny and Infection (MEΦI), IRD 198, Assistance Publique-Hopitaux de Marseille (AP-HM), Aix-Marseille Université UM63, Marseille, France
| | - Amina Cherif Louazani
- Microbes, Evolution, Phylogeny and Infection (MEΦI), IRD 198, Assistance Publique-Hopitaux de Marseille (AP-HM), Aix-Marseille Université UM63, Marseille, France
| | - Agnello Picorelli
- Laboratório de Algoritmos em Biologia, Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Graziele Pereira Oliveira
- Laboratório de Vírus, Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
- Microbes, Evolution, Phylogeny and Infection (MEΦI), IRD 198, Assistance Publique-Hopitaux de Marseille (AP-HM), Aix-Marseille Université UM63, Marseille, France
| | - Francisco Pereira Lobo
- Laboratório de Algoritmos em Biologia, Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Philippe Colson
- Microbes, Evolution, Phylogeny and Infection (MEΦI), IRD 198, Assistance Publique-Hopitaux de Marseille (AP-HM), Aix-Marseille Université UM63, Marseille, France
- Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France
| | - Bernard La Scola
- Microbes, Evolution, Phylogeny and Infection (MEΦI), IRD 198, Assistance Publique-Hopitaux de Marseille (AP-HM), Aix-Marseille Université UM63, Marseille, France
- Institut Hospitalo-Universitaire (IHU) - Méditerranée Infection, Marseille, France
| | - Jônatas Santos Abrahão
- Laboratório de Vírus, Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| |
Collapse
|
26
|
Fornes O, Castro-Mondragon JA, Khan A, van der Lee R, Zhang X, Richmond PA, Modi BP, Correard S, Gheorghe M, Baranašić D, Santana-Garcia W, Tan G, Chèneby J, Ballester B, Parcy F, Sandelin A, Lenhard B, Wasserman WW, Mathelier A. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2020; 48:D87-D92. [PMID: 31701148 PMCID: PMC7145627 DOI: 10.1093/nar/gkz1001] [Citation(s) in RCA: 856] [Impact Index Per Article: 171.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 10/15/2019] [Accepted: 10/16/2019] [Indexed: 02/07/2023] Open
Abstract
JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 156 PFMs were updated (125 for vertebrates, 28 for plants and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles. Moreover, we created a Q&A forum to ease the communication between the user community and JASPAR curators. Finally, we updated the genomic tracks, inference tool, and TF-binding profile similarity clusters. All the data is available through the JASPAR website, its associated RESTful API, and through the JASPAR2020 R/Bioconductor package.
Collapse
Affiliation(s)
- Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Aziz Khan
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Robin van der Lee
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Xi Zhang
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Phillip A Richmond
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Bhavi P Modi
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Solenne Correard
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Marius Gheorghe
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Damir Baranašić
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London W12 0NN, UK
- Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London W120NN, UK
| | - Walter Santana-Garcia
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Ge Tan
- Functional Genomics Centre Zurich, ETH Zurich, Zurich, Switzerland
| | | | | | - François Parcy
- CNRS, Univ. Grenoble Alpes, CEA, INRA, IRIG-LPCV, 38000 Grenoble, France
| | - Albin Sandelin
- The Bioinformatics Centre, Department of Biology and Biotech Research & Innovation Centre, University of Copenhagen, DK2200 Copenhagen N, Denmark
| | - Boris Lenhard
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London W12 0NN, UK
- Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London W120NN, UK
- Sars International Centre for Marine Molecular Biology, University of Bergen, N-5008 Bergen, Norway
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway
| |
Collapse
|
27
|
Lai X, Stigliani A, Vachon G, Carles C, Smaczniak C, Zubieta C, Kaufmann K, Parcy F. Building Transcription Factor Binding Site Models to Understand Gene Regulation in Plants. MOLECULAR PLANT 2019; 12:743-763. [PMID: 30447332 DOI: 10.1016/j.molp.2018.10.010] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 09/20/2018] [Accepted: 10/30/2018] [Indexed: 06/09/2023]
Abstract
Transcription factors (TFs) are key cellular components that control gene expression. They recognize specific DNA sequences, the TF binding sites (TFBSs), and thus are targeted to specific regions of the genome where they can recruit transcriptional co-factors and/or chromatin regulators to fine-tune spatiotemporal gene regulation. Therefore, the identification of TFBSs in genomic sequences and their subsequent quantitative modeling is of crucial importance for understanding and predicting gene expression. Here, we review how TFBSs can be determined experimentally, how the TFBS models can be constructed in silico, and how they can be optimized by taking into account features such as position interdependence within TFBSs, DNA shape, and/or by introducing state-of-the-art computational algorithms such as deep learning methods. In addition, we discuss the integration of context variables into the TFBS modeling, including nucleosome positioning, chromatin states, methylation patterns, 3D genome architectures, and TF cooperative binding, in order to better predict TF binding under cellular contexts. Finally, we explore the possibilities of combining the optimized TFBS model with technological advances, such as targeted TFBS perturbation by CRISPR, to better understand gene regulation, evolution, and plant diversity.
Collapse
Affiliation(s)
- Xuelei Lai
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France.
| | - Arnaud Stigliani
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Gilles Vachon
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Cristel Carles
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Cezary Smaczniak
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Chloe Zubieta
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Kerstin Kaufmann
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - François Parcy
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France.
| |
Collapse
|