1
|
Deyneko IV. Guidelines on the performance evaluation of motif recognition methods in bioinformatics. Front Genet 2023; 14:1135320. [PMID: 36824436 PMCID: PMC9941176 DOI: 10.3389/fgene.2023.1135320] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 01/19/2023] [Indexed: 02/09/2023] Open
|
2
|
Identification and Characterization of Cis-Regulatory Elements for Photoreceptor-Type-Specific Transcription in ZebraFish. Methods Mol Biol 2020; 2092:123-145. [PMID: 31786786 DOI: 10.1007/978-1-0716-0175-4_10] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/06/2022]
Abstract
Tissue-specific or cell-type-specific transcription of protein-coding genes is controlled by both trans-regulatory elements (TREs) and cis-regulatory elements (CREs). However, it is challenging to identify TREs and CREs, which are unknown for most genes. Here, we describe a protocol for identifying two types of transcription-activating CREs-core promoters and enhancers-of zebrafish photoreceptor type-specific genes. This protocol is composed of three phases: bioinformatic prediction, experimental validation, and characterization of the CREs. To better illustrate the principles and logic of this protocol, we exemplify it with the discovery of the core promoter and enhancer of the mpp5b apical polarity gene (also known as ponli), whose red, green, and blue (RGB) cone-specific transcription requires its enhancer, a member of the rainbow enhancer family. While exemplified with an RGB-cone-specific gene, this protocol is general and can be used to identify the core promoters and enhancers of other protein-coding genes.
Collapse
|
3
|
A New Algorithm for Identifying Cis-Regulatory Modules Based on Hidden Markov Model. BIOMED RESEARCH INTERNATIONAL 2018; 2017:6274513. [PMID: 28497059 PMCID: PMC5405574 DOI: 10.1155/2017/6274513] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2016] [Revised: 03/06/2017] [Accepted: 03/23/2017] [Indexed: 11/24/2022]
Abstract
The discovery of cis-regulatory modules (CRMs) is the key to understanding mechanisms of transcription regulation. Since CRMs have specific regulatory structures that are the basis for the regulation of gene expression, how to model the regulatory structure of CRMs has a considerable impact on the performance of CRM identification. The paper proposes a CRM discovery algorithm called ComSPS. ComSPS builds a regulatory structure model of CRMs based on HMM by exploring the rules of CRM transcriptional grammar that governs the internal motif site arrangement of CRMs. We test ComSPS on three benchmark datasets and compare it with five existing methods. Experimental results show that ComSPS performs better than them.
Collapse
|
4
|
Perspectives on Gene Regulatory Network Evolution. Trends Genet 2017; 33:436-447. [PMID: 28528721 DOI: 10.1016/j.tig.2017.04.005] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 04/24/2017] [Accepted: 04/25/2017] [Indexed: 11/23/2022]
Abstract
Animal development proceeds through the activity of genes and their cis-regulatory modules (CRMs) working together in sets of gene regulatory networks (GRNs). The emergence of species-specific traits and novel structures results from evolutionary changes in GRNs. Recent work in a wide variety of animal models, and particularly in insects, has started to reveal the modes and mechanisms of GRN evolution. I discuss here various aspects of GRN evolution and argue that developmental system drift (DSD), in which conserved phenotype is nevertheless a result of changed genetic interactions, should regularly be viewed from the perspective of GRN evolution. Advances in methods to discover related CRMs in diverse insect species, a critical requirement for detailed GRN characterization, are also described.
Collapse
|
5
|
Rainbow Enhancers Regulate Restrictive Transcription in Teleost Green, Red, and Blue Cones. J Neurosci 2017; 37:2834-2848. [PMID: 28193687 DOI: 10.1523/jneurosci.3421-16.2017] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Revised: 12/31/2016] [Accepted: 01/27/2017] [Indexed: 01/24/2023] Open
Abstract
Photoreceptor-specific transcription of individual genes collectively constitutes the transcriptional profile that orchestrates the structural and functional characteristics of each photoreceptor type. It is challenging, however, to study the transcriptional specificity of individual photoreceptor genes because each gene's distinct spatiotemporal transcription patterns are determined by the unique interactions between a specific set of transcription factors and the gene's own cis-regulatory elements (CREs), which remain unknown for most of the genes. For example, it is unknown what CREs underlie the zebrafish mpp5bponli (ponli) and crumbs2b (crb2b) apical polarity genes' restrictive transcription in the red, green, and blue (RGB) cones in the retina, but not in other retinal cell types. Here we show that the intronic enhancers of both the ponli and crb2b genes are conserved among teleost species and that they share sequence motifs that are critical for RGB cone-specific transcription. Given their similarities in sequences and functions, we name the ponli and crb2b enhancers collectively rainbow enhancers. Rainbow enhancers may represent a cis-regulatory mechanism to turn on a group of genes that are commonly and restrictively expressed in RGB cones, which largely define the beginning of the color vision pathway.SIGNIFICANCE STATEMENT Dim-light achromatic vision and bright-light color vision are initiated in rod and several types of cone photoreceptors, respectively; these photoreceptors are structurally distinct from each other. In zebrafish, although quite different from rods and UV cones, RGB cones (red, green, and blue cones) are structurally similar and unite into mirror-symmetric pentamers (G-R-B-R-G) by adhesion. This structural commonality and unity suggest that a set of genes is commonly expressed only in RGB cones but not in other cells. Here, we report that the rainbow enhancers activate RGB cone-specific transcription of the ponli and crb2b genes. This study provides a starting point to study how RGB cone-specific transcription defines RGB cones' distinct functions for color vision.
Collapse
|
6
|
Guo H, Huo H, Yu Q. SMCis: An Effective Algorithm for Discovery of Cis-Regulatory Modules. PLoS One 2016; 11:e0162968. [PMID: 27637070 PMCID: PMC5026350 DOI: 10.1371/journal.pone.0162968] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 08/31/2016] [Indexed: 12/02/2022] Open
Abstract
The discovery of cis-regulatory modules (CRMs) is a challenging problem in computational biology. Limited by the difficulty of using an HMM to model dependent features in transcriptional regulatory sequences (TRSs), the probabilistic modeling methods based on HMMs cannot accurately represent the distance between regulatory elements in TRSs and are cumbersome to model the prevailing dependencies between motifs within CRMs. We propose a probabilistic modeling algorithm called SMCis, which builds a more powerful CRM discovery model based on a hidden semi-Markov model. Our model characterizes the regulatory structure of CRMs and effectively models dependencies between motifs at a higher level of abstraction based on segments rather than nucleotides. Experimental results on three benchmark datasets indicate that our method performs better than the compared algorithms.
Collapse
Affiliation(s)
- Haitao Guo
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi, China
| | - Hongwei Huo
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi, China
- * E-mail:
| | - Qiang Yu
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi, China
| |
Collapse
|
7
|
Santolini M, Sakakibara I, Gauthier M, Ribas-Aulinas F, Takahashi H, Sawasaki T, Mouly V, Concordet JP, Defossez PA, Hakim V, Maire P. MyoD reprogramming requires Six1 and Six4 homeoproteins: genome-wide cis-regulatory module analysis. Nucleic Acids Res 2016; 44:8621-8640. [PMID: 27302134 PMCID: PMC5062961 DOI: 10.1093/nar/gkw512] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 05/26/2016] [Indexed: 11/12/2022] Open
Abstract
Myogenic regulatory factors of the MyoD family have the ability to reprogram differentiated cells toward a myogenic fate. In this study, we demonstrate that Six1 or Six4 are required for the reprogramming by MyoD of mouse embryonic fibroblasts (MEFs). Using microarray experiments, we found 761 genes under the control of both Six and MyoD. Using MyoD ChIPseq data and a genome-wide search for Six1/4 MEF3 binding sites, we found significant co-localization of binding sites for MyoD and Six proteins on over a thousand mouse genomic DNA regions. The combination of both datasets yielded 82 genes which are synergistically activated by Six and MyoD, with 96 associated MyoD+MEF3 putative cis-regulatory modules (CRMs). Fourteen out of 19 of the CRMs that we tested demonstrated in Luciferase assays a synergistic action also observed for their cognate gene. We searched putative binding sites on these CRMs using available databases and de novo search of conserved motifs and demonstrated that the Six/MyoD synergistic activation takes place in a feedforward way. It involves the recruitment of these two families of transcription factors to their targets, together with partner transcription factors, encoded by genes that are themselves activated by Six and MyoD, including Mef2, Pbx-Meis and EBF.
Collapse
Affiliation(s)
- Marc Santolini
- Institut Cochin, Université Paris-Descartes, Centre National de la Recherche Scientifique (CNRS), UMR 8104, Paris, France Institut National de la Santé et de la Recherche Médicale (INSERM) U1016, Paris, France Ecole Normale Supérieure, CNRS, Laboratoire de Physique Statistique, PSL Research University, Université Pierre-et-Marie Curie, Paris, France
| | - Iori Sakakibara
- Institut Cochin, Université Paris-Descartes, Centre National de la Recherche Scientifique (CNRS), UMR 8104, Paris, France Institut National de la Santé et de la Recherche Médicale (INSERM) U1016, Paris, France Division of Integrative Pathophysiology, Proteo-Science Center, Graduate School of Medicine, Ehime University, Ehime, Japan
| | - Morgane Gauthier
- Institut Cochin, Université Paris-Descartes, Centre National de la Recherche Scientifique (CNRS), UMR 8104, Paris, France Institut National de la Santé et de la Recherche Médicale (INSERM) U1016, Paris, France
| | - Francesc Ribas-Aulinas
- Institut Cochin, Université Paris-Descartes, Centre National de la Recherche Scientifique (CNRS), UMR 8104, Paris, France Institut National de la Santé et de la Recherche Médicale (INSERM) U1016, Paris, France
| | | | | | - Vincent Mouly
- Sorbonne Universités, UPMC Univ Paris 06, INSERM UMRS974, CNRS FRE3617, Center for Research in Myology, 75013 Paris, France
| | - Jean-Paul Concordet
- Institut Cochin, Université Paris-Descartes, Centre National de la Recherche Scientifique (CNRS), UMR 8104, Paris, France Institut National de la Santé et de la Recherche Médicale (INSERM) U1016, Paris, France
| | | | - Vincent Hakim
- Ecole Normale Supérieure, CNRS, Laboratoire de Physique Statistique, PSL Research University, Université Pierre-et-Marie Curie, Paris, France
| | - Pascal Maire
- Institut Cochin, Université Paris-Descartes, Centre National de la Recherche Scientifique (CNRS), UMR 8104, Paris, France Institut National de la Santé et de la Recherche Médicale (INSERM) U1016, Paris, France
| |
Collapse
|
8
|
cis-regulatory analysis of the Drosophila pdm locus reveals a diversity of neural enhancers. BMC Genomics 2015; 16:700. [PMID: 26377945 PMCID: PMC4574355 DOI: 10.1186/s12864-015-1897-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 09/02/2015] [Indexed: 11/15/2022] Open
Abstract
Background One of the major challenges in developmental biology is to understand the regulatory events that generate neuronal diversity. During Drosophila embryonic neural lineage development, cellular temporal identity is established in part by a transcription factor (TF) regulatory network that mediates a cascade of cellular identity decisions. Two of the regulators essential to this network are the POU-domain TFs Nubbin and Pdm-2, encoded by adjacent genes collectively known as pdm. The focus of this study is the discovery and characterization of cis-regulatory DNA that governs their expression. Results Phylogenetic footprinting analysis of a 125 kb genomic region that spans the pdm locus identified 116 conserved sequence clusters. To determine which of these regions function as cis-regulatory enhancers that regulate the dynamics of pdm gene expression, we tested each for in vivo enhancer activity during embryonic development and postembryonic neurogenesis. Our screen revealed 77 unique enhancers positioned throughout the noncoding region of the pdm locus. Many of these activated neural-specific gene expression during different developmental stages and many drove expression in overlapping patterns. Sequence comparisons of functionally related enhancers that activate overlapping expression patterns revealed that they share conserved elements that can be predictive of enhancer behavior. To facilitate data accessibility, the results of our analysis are catalogued in cisPatterns, an online database of the structure and function of these and other Drosophila enhancers. Conclusions These studies reveal a diversity of modular enhancers that most likely regulate pdm gene expression during embryonic and adult development, highlighting a high level of temporal and spatial expression specificity. In addition, we discovered clusters of functionally related enhancers throughout the pdm locus. A subset of these enhancers share conserved elements including sequences that correspond to known TF DNA binding sites. Although comparative analysis of the nubbin and pdm-2 encoding sequences indicate that these two genes most likely arose from a duplication event, we found only partial evidence of sequence duplication between their enhancers, suggesting that after the putative duplication their cis-regulatory DNA diverged at a higher rate than their coding sequences. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1897-2) contains supplementary material, which is available to authorized users.
Collapse
|
9
|
Leoncini M, Montangero M, Pellegrini M, Tillan KP. CMStalker: A Combinatorial Tool for Composite Motif Discovery. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1123-1136. [PMID: 26451824 DOI: 10.1109/tcbb.2014.2359444] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Controlling the differential expression of many thousands different genes at any given time is a fundamental task of metazoan organisms and this complex orchestration is controlled by the so-called regulatory genome encoding complex regulatory networks: several Transcription Factors bind to precise DNA regions, so to perform in a cooperative manner a specific regulation task for nearby genes. The in silico prediction of these binding sites is still an open problem, notwithstanding continuous progress and activity in the last two decades. In this paper, we describe a new efficient combinatorial approach to the problem of detecting sets of cooperating binding sites in promoter sequences, given in input a database of Transcription Factor Binding Sites encoded as Position Weight Matrices. We present CMStalker, a software tool for composite motif discovery which embodies a new approach that combines a constraint satisfaction formulation with a parameter relaxation technique to explore efficiently the space of possible solutions. Extensive experiments with 12 data sets and 11 state-of-the-art tools are reported, showing an average value of the correlation coefficient of 0.54 (against a value 0.41 of the closest competitor). This improvements in output quality due to CMStalker is statistically significant.
Collapse
|
10
|
Suryamohan K, Halfon MS. Identifying transcriptional cis-regulatory modules in animal genomes. WILEY INTERDISCIPLINARY REVIEWS. DEVELOPMENTAL BIOLOGY 2015; 4:59-84. [PMID: 25704908 PMCID: PMC4339228 DOI: 10.1002/wdev.168] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Revised: 11/04/2014] [Accepted: 11/16/2014] [Indexed: 11/08/2022]
Abstract
UNLABELLED Gene expression is regulated through the activity of transcription factors (TFs) and chromatin-modifying proteins acting on specific DNA sequences, referred to as cis-regulatory elements. These include promoters, located at the transcription initiation sites of genes, and a variety of distal cis-regulatory modules (CRMs), the most common of which are transcriptional enhancers. Because regulated gene expression is fundamental to cell differentiation and acquisition of new cell fates, identifying, characterizing, and understanding the mechanisms of action of CRMs is critical for understanding development. CRM discovery has historically been challenging, as CRMs can be located far from the genes they regulate, have few readily identifiable sequence characteristics, and for many years were not amenable to high-throughput discovery methods. However, the recent availability of complete genome sequences and the development of next-generation sequencing methods have led to an explosion of both computational and empirical methods for CRM discovery in model and nonmodel organisms alike. Experimentally, CRMs can be identified through chromatin immunoprecipitation directed against TFs or histone post-translational modifications, identification of nucleosome-depleted 'open' chromatin regions, or sequencing-based high-throughput functional screening. Computational methods include comparative genomics, clustering of known or predicted TF-binding sites, and supervised machine-learning approaches trained on known CRMs. All of these methods have proven effective for CRM discovery, but each has its own considerations and limitations, and each is subject to a greater or lesser number of false-positive identifications. Experimental confirmation of predictions is essential, although shortcomings in current methods suggest that additional means of validation need to be developed. For further resources related to this article, please visit the WIREs website. CONFLICT OF INTEREST The authors have declared no conflicts of interest for this article.
Collapse
Affiliation(s)
- Kushal Suryamohan
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- NY State Center of Excellence in Bioinformatics and Life Sciences, Buffalo, NY 14203, USA
| | - Marc S. Halfon
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- NY State Center of Excellence in Bioinformatics and Life Sciences, Buffalo, NY 14203, USA
- Molecular and Cellular Biology Department and Program in Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| |
Collapse
|
11
|
Sakakibara I, Santolini M, Ferry A, Hakim V, Maire P. Six homeoproteins and a Iinc-RNA at the fast MYH locus lock fast myofiber terminal phenotype. PLoS Genet 2014; 10:e1004386. [PMID: 24852826 PMCID: PMC4031048 DOI: 10.1371/journal.pgen.1004386] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Accepted: 04/02/2014] [Indexed: 12/18/2022] Open
Abstract
Thousands of long intergenic non-coding RNAs (lincRNAs) are encoded by the mammalian genome. However, the function of most of these lincRNAs has not been identified in vivo. Here, we demonstrate a role for a novel lincRNA, linc-MYH, in adult fast-type myofiber specialization. Fast myosin heavy chain (MYH) genes and linc-MYH share a common enhancer, located in the fast MYH gene locus and regulated by Six1 homeoproteins. linc-MYH in nuclei of fast-type myofibers prevents slow-type and enhances fast-type gene expression. Functional fast-sarcomeric unit formation is achieved by the coordinate expression of fast MYHs and linc-MYH, under the control of a common Six-bound enhancer.
Collapse
Affiliation(s)
- Iori Sakakibara
- INSERM U1016, Institut Cochin, Paris, France
- CNRS UMR 8104, Paris, France
- Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - Marc Santolini
- Laboratoire de Physique Statistique, CNRS, Université P. et M. Curie, Université D. Diderot, École Normale Supérieure, Paris, France
| | - Arnaud Ferry
- CNRS UMR 8104, Paris, France
- Université Pierre et Marie Curie-Paris 6, Sorbonne Universités, UMR S794, INSERM U974, CNRS UMR7215, Institut de Myologie, Paris, France
| | - Vincent Hakim
- Laboratoire de Physique Statistique, CNRS, Université P. et M. Curie, Université D. Diderot, École Normale Supérieure, Paris, France
| | - Pascal Maire
- INSERM U1016, Institut Cochin, Paris, France
- CNRS UMR 8104, Paris, France
- Université Paris Descartes, Sorbonne Paris Cité, Paris, France
- * E-mail:
| |
Collapse
|