1
|
Theepalakshmi P, Reddy US. Freezing firefly algorithm for efficient planted (ℓ, d) motif search. Med Biol Eng Comput 2022; 60:511-530. [PMID: 35020123 DOI: 10.1007/s11517-021-02468-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 11/06/2021] [Indexed: 10/19/2022]
Abstract
The detection of inimitable patterns (motif) occurring in a set of biological sequences could elevate new biological discoveries. Its application in recognition of transcription factors and their binding sites have demonstrated the necessity to attain knowledge of gene function, human diseases, and drug design. The literature identifies (ℓ, d) motif search as the widely studied problem in PMS (Planted Motif Search). This paper proposes an efficient optimization algorithm named "Freezing FireFly (FFF)" to solve (ℓ, d) motif search problem. The new strategy freezing such as local and global was added to increase the performance of the basic Firefly algorithm. It freezes the best possible out coming positions even in the lesser brighter one. The performance of the proposed algorithm is experienced on simulated and real datasets. The experimental results show that the proposed algorithm resolves the instance (50, 21) within 1.47 min in the simulated dataset. For real (such as ChIP-seq (Chromatin Immunoprecipitation)) and synthetic datasets, the proposed algorithm runs much faster in comparison to existing state-of-the-art optimization algorithms, including Samselect, TraverStringRef, PMS8, qPMS9, AlignACE, FMGA, and GSGA.
Collapse
Affiliation(s)
- P Theepalakshmi
- Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamilnadu, India.
| | - U Srinivasulu Reddy
- Machine Learning and Data Analytics Lab, Center of Excellence in Artificial Intelligence, Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamilnadu, India
| |
Collapse
|
2
|
Kunz T, Rieber L, Mahony S. Assessing relationships between chromatin interactions and regulatory genomic activities using the self-organizing map. Methods 2020; 189:12-21. [PMID: 32652235 DOI: 10.1016/j.ymeth.2020.07.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 06/09/2020] [Accepted: 07/03/2020] [Indexed: 11/24/2022] Open
Abstract
Few existing methods enable the visualization of relationships between regulatory genomic activities and genome organization as captured by Hi-C experimental data. Genome-wide Hi-C datasets are often displayed using "heatmap" matrices, but it is difficult to intuit from these heatmaps which biochemical activities are compartmentalized together. High-dimensional Hi-C data vectors can alternatively be projected onto three-dimensional space using dimensionality reduction techniques. The resulting three-dimensional structures can serve as scaffolds for projecting other forms of genomic information, thereby enabling the exploration of relationships between genome organization and various genome annotations. However, while three-dimensional models are contextually appropriate for chromatin interaction data, some analyses and visualizations may be more intuitively and conveniently performed in two-dimensional space. We present a novel approach to the visualization and analysis of chromatin organization based on the Self-Organizing Map (SOM). The SOM algorithm provides a two-dimensional manifold which adapts to represent the high dimensional chromatin interaction space. The resulting data structure can then be used to assess relationships between regulatory genomic activities and chromatin interactions. For example, given a set of genomic coordinates corresponding to a given biochemical activity, the degree to which this activity is segregated or compartmentalized in chromatin interaction space can be intuitively visualized on the 2D SOM grid and quantified using Lorenz curve analysis. We demonstrate our approach for exploratory analysis of genome compartmentalization in a high-resolution Hi-C dataset from the human GM12878 cell line. Our SOM-based approach provides an intuitive visualization of the large-scale structure of Hi-C data and serves as a platform for integrative analyses of the relationships between various genomic activities and genome organization.
Collapse
Affiliation(s)
- Timothy Kunz
- Biochemistry & Molecular Biology Department, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, USA
| | - Lila Rieber
- Biochemistry & Molecular Biology Department, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, USA
| | - Shaun Mahony
- Biochemistry & Molecular Biology Department, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
3
|
Lee NK, Li X, Wang D. A comprehensive survey on genetic algorithms for DNA motif prediction. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.07.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
4
|
Liu S, Zibetti C, Wan J, Wang G, Blackshaw S, Qian J. Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility. BMC Bioinformatics 2017; 18:355. [PMID: 28750606 PMCID: PMC5530957 DOI: 10.1186/s12859-017-1769-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 07/19/2017] [Indexed: 12/04/2022] Open
Abstract
Background Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events. Results We assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types. Conclusion Integrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal “rules”. A computational tool was developed to predict TF binding sites based on the universal “rules”. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1769-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sheng Liu
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Cristina Zibetti
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Jun Wan
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Guohua Wang
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Seth Blackshaw
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Centre for Human Systems Biology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Jiang Qian
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.
| |
Collapse
|
5
|
Fiannaca A, Rosa ML, Paglia LL, Rizzo R, Urso A. MiRNATIP: a SOM-based miRNA-target interactions predictor. BMC Bioinformatics 2016; 17:321. [PMID: 28185545 PMCID: PMC5046196 DOI: 10.1186/s12859-016-1171-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background MicroRNAs (miRNAs) are small non-coding RNA sequences with regulatory functions to post-transcriptional level for several biological processes, such as cell disease progression and metastasis. MiRNAs interact with target messenger RNA (mRNA) genes by base pairing. Experimental identification of miRNA target is one of the major challenges in cancer biology because miRNAs can act as tumour suppressors or oncogenes by targeting different type of targets. The use of machine learning methods for the prediction of the target genes is considered a valid support to investigate miRNA functions and to guide related wet-lab experiments. In this paper we propose the miRNA Target Interaction Predictor (miRNATIP) algorithm, a Self-Organizing Map (SOM) based method for the miRNA target prediction. SOM is trained with the seed region of the miRNA sequences and then the mRNA sequences are projected into the SOM lattice in order to find putative interactions with miRNAs. These interactions will be filtered considering the remaining part of the miRNA sequences and estimating the free-energy necessary for duplex stability. Results We tested the proposed method by predicting the miRNA target interactions of both the Homo sapiens and the Caenorhbditis elegans species; then, taking into account validated target (positive) and non-target (negative) interactions, we compared our results with other target predictors, namely miRanda, PITA, PicTar, mirSOM, TargetScan and DIANA-microT, in terms of the most used statistical measures. We demonstrate that our method produces the greatest number of predictions with respect to the other ones, exhibiting good results for both species, reaching the for example the highest percentage of sensitivity of 31 and 30.5 %, respectively for Homo sapiens and for C. elegans. All the predicted interaction are freely available at the following url: http://tblab.pa.icar.cnr.it/public/miRNATIP/. Conclusions Results state miRNATIP outperforms or is comparable to the other six state-of-the-art methods, in terms of validated target and non-target interactions, respectively.
Collapse
Affiliation(s)
- Antonino Fiannaca
- National Research Council of Italy, ICAR-CNR, via Ugo La Malfa 153, Palermo, 90146, Italy.
| | - Massimo La Rosa
- National Research Council of Italy, ICAR-CNR, via Ugo La Malfa 153, Palermo, 90146, Italy
| | - Laura La Paglia
- National Research Council of Italy, ICAR-CNR, via Ugo La Malfa 153, Palermo, 90146, Italy
| | - Riccardo Rizzo
- National Research Council of Italy, ICAR-CNR, via Ugo La Malfa 153, Palermo, 90146, Italy
| | - Alfonso Urso
- National Research Council of Italy, ICAR-CNR, via Ugo La Malfa 153, Palermo, 90146, Italy
| |
Collapse
|
6
|
Tapan S, Wang D. A Further Study on Mining DNA Motifs Using Fuzzy Self-Organizing Maps. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:113-124. [PMID: 26068877 DOI: 10.1109/tnnls.2015.2435155] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Self-organizing map (SOM)-based motif mining, despite being a promising approach for problem solving, mostly fails to offer a consistent interpretation of clusters with respect to the mixed composition of signal and noise in the nodes. The main reason behind this shortcoming comes from the similarity metrics used in data assignment, specially designed with the biological interpretation for this domain, which are not meant to consider the inevitable noise mixture in the clusters. This limits the explicability of the majority of clusters that are supposedly noise dominated, degrading the overall system clarity in motif discovery. This paper aims to improve the explicability aspect of learning process by introducing a composite similarity function (CSF) that is specially designed for the k -mer-to-cluster similarity measure with respect to the degree of motif properties and embedded noise in the cluster. Our proposed motif finding algorithm in this paper is built on our previous work robust elicitation algorithms for discovering (READ) [1] and termed READ Deoxyribonucleic acid motifs using CSFs (READ(csf)), which performs slightly better than READ and shows some remarkable improvements over SOM-based SOMBRERO and SOMEA tools in terms of F-measure on the testing data sets. A real data set containing multiple motifs is used to explore the potential of the READ(csf) for more challenging biological data mining tasks. Visual comparisons with the verified logos extracted from JASPAR database demonstrate that our algorithm is promising to discover multiple motifs simultaneously.
Collapse
|
7
|
Harigua-Souiai E, Cortes-Ciriano I, Desdouits N, Malliavin TE, Guizani I, Nilges M, Blondel A, Bouvier G. Identification of binding sites and favorable ligand binding moieties by virtual screening and self-organizing map analysis. BMC Bioinformatics 2015; 16:93. [PMID: 25888251 PMCID: PMC4381396 DOI: 10.1186/s12859-015-0518-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Accepted: 02/24/2015] [Indexed: 11/24/2022] Open
Abstract
Background Identifying druggable cavities on a protein surface is a crucial step in structure based drug design. The cavities have to present suitable size and shape, as well as appropriate chemical complementarity with ligands. Results We present a novel cavity prediction method that analyzes results of virtual screening of specific ligands or fragment libraries by means of Self-Organizing Maps. We demonstrate the method with two thoroughly studied proteins where it successfully identified their active sites (AS) and relevant secondary binding sites (BS). Moreover, known active ligands mapped the AS better than inactive ones. Interestingly, docking a naive fragment library brought even more insight. We then systematically applied the method to the 102 targets from the DUD-E database, where it showed a 90% identification rate of the AS among the first three consensual clusters of the SOM, and in 82% of the cases as the first one. Further analysis by chemical decomposition of the fragments improved BS prediction. Chemical substructures that are representative of the active ligands preferentially mapped in the AS. Conclusion The new approach provides valuable information both on relevant BSs and on chemical features promoting bioactivity. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0518-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Emna Harigua-Souiai
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France. .,Laboratory of Molecular Epidemiology and Experimental Pathology - LR11IPT04, Institut Pasteur de Tunis, Université Tunis el Manar - Tunisia, 13, Place Pasteur, Tunis, 1002, Tunisia. .,University of Carthage, Faculty of sciences of Bizerte - Tunisia, Jarzouna, 7021, Tunisia.
| | - Isidro Cortes-Ciriano
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Nathan Desdouits
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Thérèse E Malliavin
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Ikram Guizani
- Laboratory of Molecular Epidemiology and Experimental Pathology - LR11IPT04, Institut Pasteur de Tunis, Université Tunis el Manar - Tunisia, 13, Place Pasteur, Tunis, 1002, Tunisia.
| | - Michael Nilges
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Arnaud Blondel
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Guillaume Bouvier
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| |
Collapse
|
8
|
Beadell AV, Haag ES. Evolutionary Dynamics of GLD-1-mRNA complexes in Caenorhabditis nematodes. Genome Biol Evol 2014; 7:314-35. [PMID: 25502909 PMCID: PMC4316625 DOI: 10.1093/gbe/evu272] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/04/2014] [Indexed: 12/17/2022] Open
Abstract
Given the large number of RNA-binding proteins and regulatory RNAs within genomes, posttranscriptional regulation may be an underappreciated aspect of cis-regulatory evolution. Here, we focus on nematode germ cells, which are known to rely heavily upon translational control to regulate meiosis and gametogenesis. GLD-1 belongs to the STAR-domain family of RNA-binding proteins, conserved throughout eukaryotes, and functions in Caenorhabditis elegans as a germline-specific translational repressor. A phylogenetic analysis across opisthokonts shows that GLD-1 is most closely related to Drosophila How and deuterostome Quaking, both implicated in alternative splicing. We identify messenger RNAs associated with C. briggsae GLD-1 on a genome-wide scale and provide evidence that many participate in aspects of germline development. By comparing our results with published C. elegans GLD-1 targets, we detect nearly 100 that are conserved between the two species. We also detected several hundred Cbr-GLD-1 targets whose homologs have not been reported to be associated with C. elegans GLD-1 in either of two independent studies. Low expression in C. elegans may explain the failure to detect most of them, but a highly expressed subset are strong candidates for Cbr-GLD-1-specific targets. We examine GLD-1-binding motifs among targets conserved in C. elegans and C. briggsae and find that most, but not all, display evidence of shared ancestral binding sites. Our work illustrates both the conservative and the dynamic character of evolution at the posttranslational level of gene regulation, even between congeners.
Collapse
Affiliation(s)
- Alana V Beadell
- Program in Behavior, Evolution, Ecology, and Systematics, University of Maryland, College Park Present address: Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL
| | - Eric S Haag
- Program in Behavior, Evolution, Ecology, and Systematics, University of Maryland, College Park Department of Biology, University of Maryland, College Park
| |
Collapse
|
9
|
Tran NTL, Huang CH. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data. Biol Direct 2014; 9:4. [PMID: 24555784 PMCID: PMC4022013 DOI: 10.1186/1745-6150-9-4] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Revised: 01/08/2014] [Accepted: 02/11/2014] [Indexed: 12/24/2022] Open
Abstract
Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong).
Collapse
Affiliation(s)
- Ngoc Tam L Tran
- Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, CT 06269, USA.
| | | |
Collapse
|
10
|
Wang D, Tapan S. A robust elicitation algorithm for discovering DNA motifs using fuzzy self-organizing maps. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:1677-1688. [PMID: 24808603 DOI: 10.1109/tnnls.2013.2275733] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
It is important to identify DNA motifs in promoter regions to understand the mechanism of gene regulation. Computational approaches for finding DNA motifs are well recognized as useful tools to biologists, which greatly help in saving experimental time and cost in wet laboratories. Self-organizing maps (SOMs), as a powerful clustering tool, have demonstrated good potential for problem solving. However, the current SOM-based motif discovery algorithms unfairly treat data samples lying around the cluster boundaries by assigning them to one of the nodes, which may result in unreliable system performance. This paper aims to develop a robust framework for discovering DNA motifs, where fuzzy SOMs, with an integration of fuzzy c-means membership functions and a standard batch-learning scheme, are employed to extract putative motifs with varying length in a recursive manner. Experimental results on eight real datasets show that our proposed algorithm outperforms the other searching tools such as SOMBRERO, SOMEA, MEME, AlignACE, and WEEDER in terms of the F-measure and algorithm reliability. It is observed that a remarkable 24.6% improvement can be achieved compared to the state-of-the-art SOMBRERO. Furthermore, our algorithm can produce a 20% and 6.6% improvement over SOMBRERO and SOMEA, respectively, in finding multiple motifs on five artificial datasets.
Collapse
|
11
|
Zamani N, Russell P, Lantz H, Hoeppner MP, Meadows JR, Vijay N, Mauceli E, di Palma F, Lindblad-Toh K, Jern P, Grabherr MG. Unsupervised genome-wide recognition of local relationship patterns. BMC Genomics 2013; 14:347. [PMID: 23706020 PMCID: PMC3669000 DOI: 10.1186/1471-2164-14-347] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2012] [Accepted: 05/08/2013] [Indexed: 12/05/2022] Open
Abstract
Background Phenomena such as incomplete lineage sorting, horizontal gene transfer, gene duplication and subsequent sub- and neo-functionalisation can result in distinct local phylogenetic relationships that are discordant with species phylogeny. In order to assess the possible biological roles for these subdivisions, they must first be identified and characterised, preferably on a large scale and in an automated fashion. Results We developed Saguaro, a combination of a Hidden Markov Model (HMM) and a Self Organising Map (SOM), to characterise local phylogenetic relationships among aligned sequences using cacti, matrices of pair-wise distance measures. While the HMM determines the genomic boundaries from aligned sequences, the SOM hypothesises new cacti in an unsupervised and iterative fashion based on the regions that were modelled least well by existing cacti. After testing the software on simulated data, we demonstrate the utility of Saguaro by testing two different data sets: (i) 181 Dengue virus strains, and (ii) 5 primate genomes. Saguaro identifies regions under lineage-specific constraint for the first set, and genomic segments that we attribute to incomplete lineage sorting in the second dataset. Intriguingly for the primate data, Saguaro also classified an additional ~3% of the genome as most incompatible with the expected species phylogeny. A substantial fraction of these regions was found to overlap genes associated with both the innate and adaptive immune systems. Conclusions Saguaro detects distinct cacti describing local phylogenetic relationships without requiring any a priori hypotheses. We have successfully demonstrated Saguaro’s utility with two contrasting data sets, one containing many members with short sequences (Dengue viral strains: n = 181, genome size = 10,700 nt), and the other with few members but complex genomes (related primate species: n = 5, genome size = 3 Gb), suggesting that the software is applicable to a wide variety of experimental populations. Saguaro is written in C++, runs on the Linux operating system, and can be downloaded from http://saguarogw.sourceforge.net/.
Collapse
Affiliation(s)
- Neda Zamani
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Wang D, Tapan S. MISCORE: a new scoring function for characterizing DNA regulatory motifs in promoter sequences. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 2:S4. [PMID: 23282090 PMCID: PMC3521183 DOI: 10.1186/1752-0509-6-s2-s4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Background Computational approaches for finding DNA regulatory motifs in promoter sequences are useful to biologists in terms of reducing the experimental costs and speeding up the discovery process of de novo binding sites. It is important for rule-based or clustering-based motif searching schemes to effectively and efficiently evaluate the similarity between a k-mer (a k-length subsequence) and a motif model, without assuming the independence of nucleotides in motif models or without employing computationally expensive Markov chain models to estimate the background probabilities of k-mers. Also, it is interesting and beneficial to use a priori knowledge in developing advanced searching tools. Results This paper presents a new scoring function, termed as MISCORE, for functional motif characterization and evaluation. Our MISCORE is free from: (i) any assumption on model dependency; and (ii) the use of Markov chain model for background modeling. It integrates the compositional complexity of motif instances into the function. Performance evaluations with comparison to the well-known Maximum a Posteriori (MAP) score and Information Content (IC) have shown that MISCORE has promising capabilities to separate and recognize functional DNA motifs and its instances from non-functional ones. Conclusions MISCORE is a fast computational tool for candidate motif characterization, evaluation and selection. It enables to embed priori known motif models for computing motif-to-motif similarity, which is more advantageous than IC and MAP score. In addition to these merits mentioned above, MISCORE can automatically filter out some repetitive k-mers from a motif model due to the introduction of the compositional complexity in the function. Consequently, the merits of our proposed MISCORE in terms of both motif signal modeling power and computational efficiency will make it more applicable in the development of computational motif discovery tools.
Collapse
Affiliation(s)
- Dianhui Wang
- Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Victoria 3086, Australia.
| | | |
Collapse
|
13
|
Chien TY, Lin CK, Lin CW, Weng YZ, Chen CY, Chang DTH. DBD2BS: connecting a DNA-binding protein with its binding sites. Nucleic Acids Res 2012; 40:W173-9. [PMID: 22693214 PMCID: PMC3394304 DOI: 10.1093/nar/gks564] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2012] [Revised: 05/07/2012] [Accepted: 05/19/2012] [Indexed: 11/25/2022] Open
Abstract
By binding to short and highly conserved DNA sequences in genomes, DNA-binding proteins initiate, enhance or repress biological processes. Accurately identifying such binding sites, often represented by position weight matrices (PWMs), is an important step in understanding the control mechanisms of cells. When given coordinates of a DNA-binding domain (DBD) bound with DNA, a potential function can be used to estimate the change of binding affinity after base substitutions, where the changes can be summarized as a PWM. This technique provides an effective alternative when the chromatin immunoprecipitation data are unavailable for PWM inference. To facilitate the procedure of predicting PWMs based on protein-DNA complexes or even structures of the unbound state, the web server, DBD2BS, is presented in this study. The DBD2BS uses an atom-level knowledge-based potential function to predict PWMs characterizing the sequences to which the query DBD structure can bind. For unbound queries, a list of 1066 DBD-DNA complexes (including 1813 protein chains) is compiled for use as templates for synthesizing bound structures. The DBD2BS provides users with an easy-to-use interface for visualizing the PWMs predicted based on different templates and the spatial relationships of the query protein, the DBDs and the DNAs. The DBD2BS is the first attempt to predict PWMs of DBDs from unbound structures rather than from bound ones. This approach increases the number of existing protein structures that can be exploited when analyzing protein-DNA interactions. In a recent study, the authors showed that the kernel adopted by the DBD2BS can generate PWMs consistent with those obtained from the experimental data. The use of DBD2BS to predict PWMs can be incorporated with sequence-based methods to discover binding sites in genome-wide studies. Available at: http://dbd2bs.csie.ntu.edu.tw/, http://dbd2bs.csbb.ntu.edu.tw/, and http://dbd2bs.ee.ncku.edu.tw.
Collapse
Affiliation(s)
- Ting-Ying Chien
- Department of Computer Science and Information Engineering, Center for Systems Biology, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106, Taiwan, and Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan
| | - Chih-Kang Lin
- Department of Computer Science and Information Engineering, Center for Systems Biology, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106, Taiwan, and Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan
| | - Chih-Wei Lin
- Department of Computer Science and Information Engineering, Center for Systems Biology, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106, Taiwan, and Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan
| | - Yi-Zhong Weng
- Department of Computer Science and Information Engineering, Center for Systems Biology, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106, Taiwan, and Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan
| | - Chien-Yu Chen
- Department of Computer Science and Information Engineering, Center for Systems Biology, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106, Taiwan, and Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan
| | - Darby Tien-Hao Chang
- Department of Computer Science and Information Engineering, Center for Systems Biology, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106, Taiwan, and Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan
| |
Collapse
|
14
|
Tan M, Yu D, Jin Y, Dou L, Li B, Wang Y, Yue J, Liang L. An information transmission model for transcription factor binding at regulatory DNA sites. Theor Biol Med Model 2012; 9:19. [PMID: 22672438 PMCID: PMC3442977 DOI: 10.1186/1742-4682-9-19] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2012] [Accepted: 05/17/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Computational identification of transcription factor binding sites (TFBSs) is a rapid, cost-efficient way to locate unknown regulatory elements. With increased potential for high-throughput genome sequencing, the availability of accurate computational methods for TFBS prediction has never been as important as it currently is. To date, identifying TFBSs with high sensitivity and specificity is still an open challenge, necessitating the development of novel models for predicting transcription factor-binding regulatory DNA elements. RESULTS Based on the information theory, we propose a model for transcription factor binding of regulatory DNA sites. Our model incorporates position interdependencies in effective ways. The model computes the information transferred (TI) between the transcription factor and the TFBS during the binding process and uses TI as the criterion to determine whether the sequence motif is a possible TFBS. Based on this model, we developed a computational method to identify TFBSs. By theoretically proving and testing our model using both real and artificial data, we found that our model provides highly accurate predictive results. CONCLUSIONS In this study, we present a novel model for transcription factor binding regulatory DNA sites. The model can provide an increased ability to detect TFBSs.
Collapse
Affiliation(s)
- Mingfeng Tan
- Beijing Institute of Biotechnology, Beijing 100071, China
| | | | | | | | | | | | | | | |
Collapse
|
15
|
Zambelli F, Pesole G, Pavesi G. Motif discovery and transcription factor binding sites before and after the next-generation sequencing era. Brief Bioinform 2012; 14:225-37. [PMID: 22517426 PMCID: PMC3603212 DOI: 10.1093/bib/bbs016] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Motif discovery has been one of the most widely studied problems in bioinformatics ever since genomic and protein sequences have been available. In particular, its application to the de novo prediction of putative over-represented transcription factor binding sites in nucleotide sequences has been, and still is, one of the most challenging flavors of the problem. Recently, novel experimental techniques like chromatin immunoprecipitation (ChIP) have been introduced, permitting the genome-wide identification of protein-DNA interactions. ChIP, applied to transcription factors and coupled with genome tiling arrays (ChIP on Chip) or next-generation sequencing technologies (ChIP-Seq) has opened new avenues in research, as well as posed new challenges to bioinformaticians developing algorithms and methods for motif discovery.
Collapse
|
16
|
Chen CY, Chien TY, Lin CK, Lin CW, Weng YZ, Chang DTH. Predicting target DNA sequences of DNA-binding proteins based on unbound structures. PLoS One 2012; 7:e30446. [PMID: 22312425 PMCID: PMC3270014 DOI: 10.1371/journal.pone.0030446] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2011] [Accepted: 12/16/2011] [Indexed: 12/17/2022] Open
Abstract
DNA-binding proteins such as transcription factors use DNA-binding domains (DBDs) to bind to specific sequences in the genome to initiate many important biological functions. Accurate prediction of such target sequences, often represented by position weight matrices (PWMs), is an important step to understand many biological processes. Recent studies have shown that knowledge-based potential functions can be applied on protein-DNA co-crystallized structures to generate PWMs that are considerably consistent with experimental data. However, this success has not been extended to DNA-binding proteins lacking co-crystallized structures. This study aims at investigating the possibility of predicting the DNA sequences bound by DNA-binding proteins from the proteins' unbound structures (structures of the unbound state). Given an unbound query protein and a template complex, the proposed method first employs structure alignment to generate synthetic protein-DNA complexes for the query protein. Once a complex is available, an atomic-level knowledge-based potential function is employed to predict PWMs characterizing the sequences to which the query protein can bind. The evaluation of the proposed method is based on seven DNA-binding proteins, which have structures of both DNA-bound and unbound forms for prediction as well as annotated PWMs for validation. Since this work is the first attempt to predict target sequences of DNA-binding proteins from their unbound structures, three types of structural variations that presumably influence the prediction accuracy were examined and discussed. Based on the analyses conducted in this study, the conformational change of proteins upon binding DNA was shown to be the key factor. This study sheds light on the challenge of predicting the target DNA sequences of a protein lacking co-crystallized structures, which encourages more efforts on the structure alignment-based approaches in addition to docking- and homology modeling-based approaches for generating synthetic complexes.
Collapse
Affiliation(s)
- Chien-Yu Chen
- Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei, Taiwan
- Center for Systems Biology, National Taiwan University, Taipei, Taiwan
- Center for Biotechnology, National Taiwan University, Taipei, Taiwan
| | - Ting-Ying Chien
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Chih-Kang Lin
- Center for Systems Biology, National Taiwan University, Taipei, Taiwan
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Chih-Wei Lin
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Yi-Zhong Weng
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Darby Tien-Hao Chang
- Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan
- * E-mail:
| |
Collapse
|
17
|
Technau M, Knispel M, Roth S. Molecular mechanisms of EGF signaling-dependent regulation of pipe, a gene crucial for dorsoventral axis formation in Drosophila. Dev Genes Evol 2011; 222:1-17. [PMID: 22198544 PMCID: PMC3291829 DOI: 10.1007/s00427-011-0384-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 11/29/2011] [Indexed: 01/28/2023]
Abstract
During Drosophila oogenesis the expression of the sulfotransferase Pipe in ventral follicle cells is crucial for dorsoventral axis formation. Pipe modifies proteins that are incorporated in the ventral eggshell and activate Toll signaling which in turn initiates embryonic dorsoventral patterning. Ventral pipe expression is the result of an oocyte-derived EGF signal which down-regulates pipe in dorsal follicle cells. The analysis of mutant follicle cell clones reveals that none of the transcription factors known to act downstream of EGF signaling in Drosophila is required or sufficient for pipe regulation. However, the pipe cis-regulatory region harbors a 31-bp element which is essential for pipe repression, and ovarian extracts contain a protein that binds this element. Thus, EGF signaling does not act by down-regulating an activator of pipe as previously suggested but rather by activating a repressor. Surprisingly, this repressor acts independent of the common co-repressors Groucho or CtBP.
Collapse
Affiliation(s)
- Martin Technau
- Institute for Developmental Biology, Biocenter, University of Cologne, Zuelpicher Straße 47b, 50674, Cologne, Germany
| | | | | |
Collapse
|
18
|
Thomas BJ, Rubio ED, Krumm N, Broin PO, Bomsztyk K, Welcsh P, Greally JM, Golden AA, Krumm A. Allele-specific transcriptional elongation regulates monoallelic expression of the IGF2BP1 gene. Epigenetics Chromatin 2011; 4:14. [PMID: 21812971 PMCID: PMC3174113 DOI: 10.1186/1756-8935-4-14] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2011] [Accepted: 08/03/2011] [Indexed: 11/13/2022] Open
Abstract
Background Random monoallelic expression contributes to phenotypic variation of cells and organisms. However, the epigenetic mechanisms by which individual alleles are randomly selected for expression are not known. Taking cues from chromatin signatures at imprinted gene loci such as the insulin-like growth factor 2 gene 2 (IGF2), we evaluated the contribution of CTCF, a zinc finger protein required for parent-of-origin-specific expression of the IGF2 gene, as well as a role for allele-specific association with DNA methylation, histone modification and RNA polymerase II. Results Using array-based chromatin immunoprecipitation, we identified 293 genomic loci that are associated with both CTCF and histone H3 trimethylated at lysine 9 (H3K9me3). A comparison of their genomic positions with those of previously published monoallelically expressed genes revealed no significant overlap between allele-specifically expressed genes and colocalized CTCF/H3K9me3. To analyze the contributions of CTCF and H3K9me3 to gene regulation in more detail, we focused on the monoallelically expressed IGF2BP1 gene. In vitro binding assays using the CTCF target motif at the IGF2BP1 gene, as well as allele-specific analysis of cytosine methylation and CTCF binding, revealed that CTCF does not regulate mono- or biallelic IGF2BP1 expression. Surprisingly, we found that RNA polymerase II is detected on both the maternal and paternal alleles in B lymphoblasts that express IGF2BP1 primarily from one allele. Thus, allele-specific control of RNA polymerase II elongation regulates the allelic bias of IGF2BP1 gene expression. Conclusions Colocalization of CTCF and H3K9me3 does not represent a reliable chromatin signature indicative of monoallelic expression. Moreover, association of individual alleles with both active (H3K4me3) and silent (H3K27me3) chromatin modifications (allelic bivalent chromatin) or with RNA polymerase II also fails to identify monoallelically expressed gene loci. The selection of individual alleles for expression occurs in part during transcription elongation.
Collapse
Affiliation(s)
- Brandon J Thomas
- Institute for Stem Cell and Regenerative Medicine, University of Washington School of Medicine, 815 Mercer St,, Seattle, WA 98109, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Heikkinen L, Kolehmainen M, Wong G. Prediction of microRNA targets in Caenorhabditis elegans using a self-organizing map. ACTA ACUST UNITED AC 2011; 27:1247-54. [PMID: 21422073 DOI: 10.1093/bioinformatics/btr144] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
MOTIVATION MicroRNAs (miRNAs) are small non-coding RNAs that regulate transcriptional processes via binding to the target gene mRNA. In animals, this binding is imperfect, which makes the computational prediction of animal miRNA targets a challenging task. The accuracy of miRNA target prediction can be improved with the use of machine learning methods. Previous work has described methods using supervised learning, but they suffer from the lack of adequate training examples, a common problem in miRNA target identification, which often leads to deficient generalization ability. RESULTS In this work, we introduce mirSOM, a miRNA target prediction tool based on clustering of short 3(')-untranslated region (3(')-UTR) substrings with self-organizing map (SOM). As our method uses unsupervised learning and a large set of verified Caenorhabditis elegans 3(')-UTRs, we did not need to resort to training using a known set of targets. Our method outperforms seven other methods in predicting the experimentally verified C.elegans true and false miRNA targets. AVAILABILITY mirSOM miRNA target predictions are available at http://kokki.uku.fi/bioinformatics/mirsom.
Collapse
Affiliation(s)
- Liisa Heikkinen
- Department of Biosciences, Department of Neurobiology, A.I.Virtanen Institute for Molecular Sciences, Biocenter Finland, Kuopio, Finland.
| | | | | |
Collapse
|
20
|
Lee NK, Wang D. SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model. BMC Bioinformatics 2011; 12 Suppl 1:S16. [PMID: 21342545 PMCID: PMC3044270 DOI: 10.1186/1471-2105-12-s1-s16] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Background Discrimination of transcription factor binding sites (TFBS) from background sequences plays a key role in computational motif discovery. Current clustering based algorithms employ homogeneous model for problem solving, which assumes that motifs and background signals can be equivalently characterized. This assumption has some limitations because both sequence signals have distinct properties. Results This paper aims to develop a Self-Organizing Map (SOM) based clustering algorithm for extracting binding sites in DNA sequences. Our framework is based on a novel intra-node soft competitive procedure to achieve maximum discrimination of motifs from background signals in datasets. The intra-node competition is based on an adaptive weighting technique on two different signal models to better represent these two classes of signals. Using several real and artificial datasets, we compared our proposed method with several motif discovery tools. Compared to SOMBRERO, a state-of-the-art SOM based motif discovery tool, it is found that our algorithm can achieve significant improvements in the average precision rates (i.e., about 27%) on the real datasets without compromising its sensitivity. Our method also performed favourably comparing against other motif discovery tools. Conclusions Motif discovery with model based clustering framework should consider the use of heterogeneous model to represent the two classes of signals in DNA sequences. Such heterogeneous model can achieve better signal discrimination compared to the homogeneous model.
Collapse
Affiliation(s)
- Nung Kion Lee
- Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Victoria 3086, Australia.
| | | |
Collapse
|
21
|
Mahony S, Mazzoni EO, McCuine S, Young RA, Wichterle H, Gifford DK. Ligand-dependent dynamics of retinoic acid receptor binding during early neurogenesis. Genome Biol 2011; 12:R2. [PMID: 21232103 PMCID: PMC3091300 DOI: 10.1186/gb-2011-12-1-r2] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2010] [Revised: 12/10/2010] [Accepted: 01/13/2011] [Indexed: 01/31/2023] Open
Abstract
Background Among its many roles in development, retinoic acid determines the anterior-posterior identity of differentiating motor neurons by activating retinoic acid receptor (RAR)-mediated transcription. RAR is thought to bind the genome constitutively, and only induce transcription in the presence of the retinoid ligand. However, little is known about where RAR binds to the genome or how it selects target sites. Results We tested the constitutive RAR binding model using the retinoic acid-driven differentiation of mouse embryonic stem cells into differentiated motor neurons. We find that retinoic acid treatment results in widespread changes in RAR genomic binding, including novel binding to genes directly responsible for anterior-posterior specification, as well as the subsequent recruitment of the basal polymerase machinery. Finally, we discovered that the binding of transcription factors at the embryonic stem cell stage can accurately predict where in the genome RAR binds after initial differentiation. Conclusions We have characterized a ligand-dependent shift in RAR genomic occupancy at the initiation of neurogenesis. Our data also suggest that enhancers active in pluripotent embryonic stem cells may be preselecting regions that will be activated by RAR during neuronal differentiation.
Collapse
Affiliation(s)
- Shaun Mahony
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | | | | | | | | |
Collapse
|
22
|
Kuo D, Tan K, Zinman G, Ravasi T, Bar-Joseph Z, Ideker T. Evolutionary divergence in the fungal response to fluconazole revealed by soft clustering. Genome Biol 2010; 11:R77. [PMID: 20653936 PMCID: PMC2926788 DOI: 10.1186/gb-2010-11-7-r77] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Revised: 07/09/2010] [Accepted: 07/23/2010] [Indexed: 11/25/2022] Open
Abstract
Background Fungal infections are an emerging health risk, especially those involving yeast that are resistant to antifungal agents. To understand the range of mechanisms by which yeasts can respond to anti-fungals, we compared gene expression patterns across three evolutionarily distant species - Saccharomyces cerevisiae, Candida glabrata and Kluyveromyces lactis - over time following fluconazole exposure. Results Conserved and diverged expression patterns were identified using a novel soft clustering algorithm that concurrently clusters data from all species while incorporating sequence orthology. The analysis suggests complementary strategies for coping with ergosterol depletion by azoles - Saccharomyces imports exogenous ergosterol, Candida exports fluconazole, while Kluyveromyces does neither, leading to extreme sensitivity. In support of this hypothesis we find that only Saccharomyces becomes more azole resistant in ergosterol-supplemented media; that this depends on sterol importers Aus1 and Pdr11; and that transgenic expression of sterol importers in Kluyveromyces alleviates its drug sensitivity. Conclusions We have compared the dynamic transcriptional responses of three diverse yeast species to fluconazole treatment using a novel clustering algorithm. This approach revealed significant divergence among regulatory programs associated with fluconazole sensitivity. In future, such approaches might be used to survey a wider range of species, drug concentrations and stimuli to reveal conserved and divergent molecular response pathways.
Collapse
Affiliation(s)
- Dwight Kuo
- Department of Bioengineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | | | | | | | | | | |
Collapse
|
23
|
Rhee JK, Joung JG, Chang JH, Fei Z, Zhang BT. Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis. BMC Genomics 2009; 10 Suppl 3:S29. [PMID: 19958493 PMCID: PMC2788382 DOI: 10.1186/1471-2164-10-s3-s29] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Gene regulation is a key mechanism in higher eukaryotic cellular processes. One of the major challenges in gene regulation studies is to identify regulators affecting the expression of their target genes in specific biological processes. Despite their importance, regulators involved in diverse biological processes still remain largely unrevealed. In the present study, we propose a kernel-based approach to efficiently identify core regulatory elements involved in specific biological processes using gene expression profiles. RESULTS We developed a framework that can detect correlations between gene expression profiles and the upstream sequences on the basis of the kernel canonical correlation analysis (kernel CCA). Using a yeast cell cycle dataset, we demonstrated that upstream sequence patterns were closely related to gene expression profiles based on the canonical correlation scores obtained by measuring the correlation between them. Our results showed that the cell cycle-specific regulatory motifs could be found successfully based on the motif weights derived through kernel CCA. Furthermore, we identified co-regulatory motif pairs using the same framework. CONCLUSION Given expression profiles, our method was able to identify regulatory motifs involved in specific biological processes. The method could be applied to the elucidation of the unknown regulatory mechanisms associated with complex gene regulatory processes.
Collapse
Affiliation(s)
- Je-Keun Rhee
- Graduate Program in Bioinformatics, Seoul National University, Seoul 151-744, Korea
- Center for Biointelligence Technology (CBIT), Seoul National University, Seoul 151-744, Korea
| | - Je-Gun Joung
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY 14853, USA
| | | | - Zhangjun Fei
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY 14853, USA
- USDA Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA
| | - Byoung-Tak Zhang
- Graduate Program in Bioinformatics, Seoul National University, Seoul 151-744, Korea
- Center for Biointelligence Technology (CBIT), Seoul National University, Seoul 151-744, Korea
- School of Computer Science and Engineering, Seoul National University, Seoul 151-744, Korea
| |
Collapse
|
24
|
An integrated genome screen identifies the Wnt signaling pathway as a major target of WT1. Proc Natl Acad Sci U S A 2009; 106:11154-9. [PMID: 19549856 DOI: 10.1073/pnas.0901591106] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
WT1, a critical regulator of kidney development, is a tumor suppressor for nephroblastoma but in some contexts functions as an oncogene. A limited number of direct transcriptional targets of WT1 have been identified to explain its complex roles in tumorigenesis and organogenesis. In this study we performed genome-wide screening for direct WT1 targets, using a combination of ChIP-ChIP and expression arrays. Promoter regions bound by WT1 were highly G-rich and resembled the sites for a number of other widely expressed transcription factors such as SP1, MAZ, and ZNF219. Genes directly regulated by WT1 were implicated in MAPK signaling, axon guidance, and Wnt pathways. Among directly bound and regulated genes by WT1, nine were identified in the Wnt signaling pathway, suggesting that WT1 modulates a subset of Wnt components and responsive genes by direct binding. To prove the biological importance of the interplay between WT1 and Wnt signaling, we showed that WT1 blocked the ability of Wnt8 to induce a secondary body axis during Xenopus embryonic development. WT1 inhibited TCF-mediated transcription activated by Wnt ligand, wild type and mutant, stabilized beta-catenin by preventing TCF4 loading onto a promoter. This was neither due to direct binding of WT1 to the TCF binding site nor to interaction between WT1 and TCF4, but by competition of WT1 and TCF4 for CBP. WT1 interference with Wnt signaling represents an important mode of its action relevant to the suppression of tumor growth and guidance of development.
Collapse
|
25
|
Murtola T, Bunker A, Vattulainen I, Deserno M, Karttunen M. Multiscale modeling of emergent materials: biological and soft matter. Phys Chem Chem Phys 2009; 11:1869-92. [PMID: 19279999 DOI: 10.1039/b818051b] [Citation(s) in RCA: 183] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In this review, we focus on four current related issues in multiscale modeling of soft and biological matter. First, we discuss how to use structural information from detailed models (or experiments) to construct coarse-grained ones in a hierarchical and systematic way. This is discussed in the context of the so-called Henderson theorem and the inverse Monte Carlo method of Lyubartsev and Laaksonen. In the second part, we take a different look at coarse graining by analyzing conformations of molecules. This is done by the application of self-organizing maps, i.e., a neural network type approach. Such an approach can be used to guide the selection of the relevant degrees of freedom. Then, we discuss technical issues related to the popular dissipative particle dynamics (DPD) method. Importantly, the potentials derived using the inverse Monte Carlo method can be used together with the DPD thermostat. In the final part we focus on solvent-free modeling which offers a different route to coarse graining by integrating out the degrees of freedom associated with solvent.
Collapse
Affiliation(s)
- Teemu Murtola
- Department of Applied Physics and Helsinki Institute of Physics, Helsinki University of Technology, Finland
| | | | | | | | | |
Collapse
|
26
|
Lu Y, Mahony S, Benos PV, Rosenfeld R, Simon I, Breeden LL, Bar-Joseph Z. Combined analysis reveals a core set of cycling genes. Genome Biol 2008; 8:R146. [PMID: 17650318 PMCID: PMC2323241 DOI: 10.1186/gb-2007-8-7-r146] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2007] [Revised: 06/19/2007] [Accepted: 07/24/2007] [Indexed: 01/28/2023] Open
Abstract
The simultaneous analysis of expression data from multiple species reveals a core set of conserved cycling genes that is much larger than previously thought. Background Global transcript levels throughout the cell cycle have been characterized using microarrays in several species. Early analysis of these experiments focused on individual species. More recently, a number of studies have concluded that a surprisingly small number of genes conserved in two or more species are periodically transcribed in these species. Combining and comparing data from multiple species is challenging because of noise in expression data, the different synchronization and scoring methods used, and the need to determine an accurate set of homologs. Results To solve these problems, we developed and applied a new algorithm to analyze expression data from multiple species simultaneously. Unlike previous studies, we find that more than 20% of cycling genes in budding yeast have cycling homologs in fission yeast and 5% to 7% of cycling genes in each of four species have cycling homologs in all other species. These conserved cycling genes display much stronger cell cycle characteristics in several complementary high throughput datasets. Essentiality analysis for yeast and human genes confirms these findings. Motif analysis indicates conservation in the corresponding regulatory mechanisms. Gene Ontology analysis and analysis of the genes in the conserved sets sheds light on the evolution of specific subfunctions within the cell cycle. Conclusion Our results indicate that the conservation in cyclic expression patterns is much greater than was previously thought. These genes are highly enriched for most cell cycle categories, and a large percentage of them are essential, supporting our claim that cross-species analysis can identify the core set of cycling genes.
Collapse
Affiliation(s)
- Yong Lu
- Department of Computer Science, Carnegie Mellon University, Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
| | - Shaun Mahony
- Department of Computational Biology, University of Pittsburgh Medical School, Lothrop Street, Pittsburgh, Pennsylvania 15213, USA
| | - Panayiotis V Benos
- Department of Computational Biology, University of Pittsburgh Medical School, Lothrop Street, Pittsburgh, Pennsylvania 15213, USA
| | - Roni Rosenfeld
- Machine Learning Department, Carnegie Mellon University, Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
| | - Itamar Simon
- Department of Molecular Biology, Hebrew University Medical School, Jerusalem, Israel 91120
| | - Linda L Breeden
- Basic Sciences Division, Fred Hutchinson Cancer Center, Fairview Avenue N, Seattle, Washington 98109, USA
| | - Ziv Bar-Joseph
- Department of Computer Science, Carnegie Mellon University, Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
- Machine Learning Department, Carnegie Mellon University, Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
27
|
Wei W, Yu XD. Comparative analysis of regulatory motif discovery tools for transcription factor binding sites. GENOMICS PROTEOMICS & BIOINFORMATICS 2007; 5:131-42. [PMID: 17893078 PMCID: PMC5054109 DOI: 10.1016/s1672-0229(07)60023-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
In the post-genomic era, identification of specific regulatory motifs or transcription factor binding sites (TFBSs) in non-coding DNA sequences, which is essential to elucidate transcriptional regulatory networks, has emerged as an obstacle that frustrates many researchers. Consequently, numerous motif discovery tools and correlated databases have been applied to solving this problem. However, these existing methods, based on different computational algorithms, show diverse motif prediction efficiency in non-coding DNA sequences. Therefore, understanding the similarities and differences of computational algorithms and enriching the motif discovery literatures are important for users to choose the most appropriate one among the online available tools. Moreover, there still lacks credible criterion to assess motif discovery tools and instructions for researchers to choose the best according to their own projects. Thus integration of the related resources might be a good approach to improve accuracy of the application. Recent studies integrate regulatory motif discovery tools with experimental methods to offer a complementary approach for researchers, and also provide a much-needed model for current researches on transcriptional regulatory networks. Here we present a comparative analysis of regulatory motif discovery tools for TFBSs.
Collapse
|
28
|
Murtola T, Kupiainen M, Falck E, Vattulainen I. Conformational analysis of lipid molecules by self-organizing maps. J Chem Phys 2007; 126:054707. [PMID: 17302498 DOI: 10.1063/1.2429066] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The authors have studied the use of the self-organizing map (SOM) in the analysis of lipid conformations produced by atomic-scale molecular dynamics simulations. First, focusing on the methodological aspects, they have systematically studied how the SOM can be employed in the analysis of lipid conformations in a controlled and reliable fashion. For this purpose, they have used a previously reported 50 ns atomistic molecular dynamics simulation of a 1-palmitoyl-2-linoeayl-sn-glycero-3-phosphatidylcholine (PLPC) lipid bilayer and analyzed separately the conformations of the headgroup and the glycerol regions, as well as the diunsaturated fatty acid chain. They have elucidated the effect of training parameters on the quality of the results, as well as the effect of the size of the SOM. It turns out that the main conformational states of each region in the molecule are easily distinguished together with a variety of other typical structural features. As a second topic, the authors applied the SOM to the PLPC data to demonstrate how it can be used in the analysis that goes beyond the standard methods commonly used to study the structure and dynamics of lipid membranes. Overall, the results suggest that the SOM method provides a relatively simple and robust tool for quickly gaining a qualitative understanding of the most important features of the conformations of the system, without a priori knowledge. It seems plausible that the insight given by the SOM could be applied to a variety of biomolecular systems and the design of coarse-grained models for these systems.
Collapse
Affiliation(s)
- Teemu Murtola
- Laboratory of Physics, Helsinki University of Technology, P.O. Box 1100, FI-02015 HUT, Finland
| | | | | | | |
Collapse
|
29
|
Abnizova I, Subhankulova T, Gilks WR. Recent computational approaches to understand gene regulation: mining gene regulation in silico. Curr Genomics 2007; 8:79-91. [PMID: 18660846 PMCID: PMC2435357 DOI: 10.2174/138920207780368150] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2006] [Revised: 12/13/2006] [Accepted: 12/15/2006] [Indexed: 01/03/2023] Open
Abstract
This paper reviews recent computational approaches to the understanding of gene regulation in eukaryotes. Cis-regulation of gene expression by the binding of transcription factors is a critical component of cellular physiology. In eukaryotes, a number of transcription factors often work together in a combinatorial fashion to enable cells to respond to a wide spectrum of environmental and developmental signals. Integration of genome sequences and/or Chromatin Immunoprecipitation on chip data with gene-expression data has facilitated in silico discovery of how the combinatorics and positioning of transcription factors binding sites underlie gene activation in a variety of cellular processes.The process of gene regulation is extremely complex and intriguing, therefore all possible points of view and related links should be carefully considered. Here we attempt to collect an inventory, not claiming it to be comprehensive and complete, of related computational biological topics covering gene regulation, which may en-lighten the process, and briefly review what is currently occurring in these areas.We will consider the following computational areas:o gene regulatory network construction;o evolution of regulatory DNA;o studies of its structural and statistical informational properties;o and finally, regulatory RNA.
Collapse
Affiliation(s)
| | - T Subhankulova
- Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, Cambridge, UK
| | | |
Collapse
|
30
|
|
31
|
Mahony S, Benos PV, Smith TJ, Golden A. Self-organizing neural networks to support the discovery of DNA-binding motifs. Neural Netw 2006; 19:950-62. [PMID: 16839740 DOI: 10.1016/j.neunet.2006.05.023] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Identification of the short DNA sequence motifs that serve as binding targets for transcription factors is an important challenge in bioinformatics. Unsupervised techniques from the statistical learning theory literature have often been applied to motif discovery, but effective solutions for large genomic datasets have yet to be found. We present here three self-organizing neural networks that have applicability to the motif-finding problem. The core system in this study is a previously described SOM-based motif-finder named SOMBRERO. The motif-finder is integrated in this work with a SOM-based method that automatically constructs generalized models for structurally related motifs and initializes SOMBRERO with relevant biological knowledge. A self-organizing tree method that displays the relationships between various motifs is also presented, and it is shown that such a method can act as an effective structural classifier of novel motifs. The performance of the three self-organizing neural networks is evaluated here using various datasets.
Collapse
Affiliation(s)
- Shaun Mahony
- Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| | | | | | | |
Collapse
|
32
|
Sandve GK, Drabløs F. A survey of motif discovery methods in an integrated framework. Biol Direct 2006; 1:11. [PMID: 16600018 PMCID: PMC1479319 DOI: 10.1186/1745-6150-1-11] [Citation(s) in RCA: 109] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2006] [Accepted: 04/06/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There has been a growing interest in computational discovery of regulatory elements, and a multitude of motif discovery methods have been proposed. Computational motif discovery has been used with some success in simple organisms like yeast. However, as we move to higher organisms with more complex genomes, more sensitive methods are needed. Several recent methods try to integrate additional sources of information, including microarray experiments (gene expression and ChlP-chip). There is also a growing awareness that regulatory elements work in combination, and that this combinatorial behavior must be modeled for successful motif discovery. However, the multitude of methods and approaches makes it difficult to get a good understanding of the current status of the field. RESULTS This paper presents a survey of methods for motif discovery in DNA, based on a structured and well defined framework that integrates all relevant elements. Existing methods are discussed according to this framework. CONCLUSION The survey shows that although no single method takes all relevant elements into consideration, a very large number of different models treating the various elements separately have been tried. Very often the choices that have been made are not explicitly stated, making it difficult to compare different implementations. Also, the tests that have been used are often not comparable. Therefore, a stringent framework and improved test methods are needed to evaluate the different approaches in order to conclude which ones are most promising. REVIEWERS This article was reviewed by Eugene V. Koonin, Philipp Bucher (nominated by Mikhail Gelfand) and Frank Eisenhaber.
Collapse
Affiliation(s)
- Geir Kjetil Sandve
- Department of Computer and Information Science, NTNU – Norwegian University of Science and Technology, N-7052, Trondheim, Norway
| | - Finn Drabløs
- Department of Cancer Research and Molecular Medicine, NTNU – Norwegian University of Science and Technology, N-7006, Trondheim, Norway
| |
Collapse
|