1
|
Yoon Y, Kim G, Jeon BN, Fang S, Park H. Bifidobacterium Strain-Specific Enhances the Efficacy of Cancer Therapeutics in Tumor-Bearing Mice. Cancers (Basel) 2021; 13:957. [PMID: 33668827 PMCID: PMC7956760 DOI: 10.3390/cancers13050957] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 02/15/2021] [Accepted: 02/19/2021] [Indexed: 12/15/2022] Open
Abstract
Colorectal cancer (CRC) is among the leading causes of cancer-related death in the world. The development of CRC is associated with smoking, diet, and microbial exposure. Previous studies have shown that dysbiosis of the gut microbiome affects cancer development, because it leads to inflammation and genotoxicity. Supplementation with specific microbiota induces anti-tumor effects by enhancing of anti-tumor immunity. Here, we observed that supplementation with either of two B. breve strains reduces tumor growth in MC38 colon carcinoma-bearing mice. Interestingly, only one B. breve strain boosted the efficacy of cancer therapeutics, including oxaliplatin and PD-1 blockade. Extensive immune profiling and transcriptomic analysis revealed that the boosting B. breve strain augments lymphocyte-mediated anti-cancer immunity. Our results suggest that supplementation with B. breve strains could potentially be used as a strategy to enhance the efficacy of CRC therapeutics.
Collapse
Affiliation(s)
- Youngmin Yoon
- Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Korea; (Y.Y.); (G.K.)
| | - Gihyeon Kim
- Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Korea; (Y.Y.); (G.K.)
| | - Bu-Nam Jeon
- Genome and Company, Pangyo-ro 255, Bundang-gu, Seoungnam 13486, Korea;
| | - Sungsoon Fang
- Severance Biomedical Science Institute, BK21 PLUS Project for Medical Science, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul 06273, Korea;
| | - Hansoo Park
- Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Korea; (Y.Y.); (G.K.)
- Genome and Company, Pangyo-ro 255, Bundang-gu, Seoungnam 13486, Korea;
| |
Collapse
|
2
|
Liu M, Yao B, Gui T, Guo C, Wu X, Li J, Ma L, Deng Y, Xu P, Wang Y, Yang D, Li Q, Zeng X, Li X, Hu R, Ge J, Yu Z, Chen Y, Chen B, Ju J, Zhao Q. PRMT5-dependent transcriptional repression of c-Myc target genes promotes gastric cancer progression. Theranostics 2020; 10:4437-4452. [PMID: 32292506 PMCID: PMC7150477 DOI: 10.7150/thno.42047] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 02/25/2020] [Indexed: 12/17/2022] Open
Abstract
The proto-oncogene c-Myc regulates multiple biological processes mainly through selectively activating gene expression. However, the mechanisms underlying c-Myc-mediated gene repression in the context of cancer remain less clear. This study aimed to clarify the role of PRMT5 in the transcriptional repression of c-Myc target genes in gastric cancer. Methods: Immunohistochemistry was used to evaluate the expression of PRMT5, c-Myc and target genes in gastric cancer patients. PRMT5 and c-Myc interaction was assessed by immunofluorescence, co-immunoprecipitation and GST pull-down assays. Bioinformatics analysis, immunoblotting, real-time PCR, chromatin immunoprecipitation, and rescue experiments were used to evaluate the mechanism. Results: We found that c-Myc directly interacts with protein arginine methyltransferase 5 (PRMT5) to transcriptionally repress the expression of a cohort of genes, including PTEN, CDKN2C (p18INK4C), CDKN1A (p21CIP1/WAF1), CDKN1C (p57KIP2) and p63, to promote gastric cancer cell growth. Specifically, we found that PRMT5 was required to promote gastric cancer cell growth in vitro and in vivo, and for transcriptional repression of this cohort of genes, which was dependent on its methyltransferase activity. Consistently, the promoters of this gene cohort were enriched for both PRMT5-mediated symmetric di-methylation of histone H4 on Arg 3 (H4R3me2s) and c-Myc, and c-Myc depletion also upregulated their expression. H4R3me2s also colocalized with the c-Myc-binding E-box motif (CANNTG) on these genes. We show that PRMT5 directly binds to c-Myc, and this binding is required for transcriptional repression of the target genes. Both c-Myc and PRMT5 expression levels were upregulated in primary human gastric cancer tissues, and their expression levels inversely correlated with clinical outcomes. Conclusions: Taken together, our study reveals a novel mechanism by which PRMT5-dependent transcriptional repression of c-Myc target genes is required for gastric cancer progression, and provides a potential new strategy for therapeutic targeting of gastric cancer.
Collapse
|
3
|
Han D, Chen S, Han W, Gao S, Owiredu JN, Li M, Balk SP, He HH, Cai C. ZBTB7A Mediates the Transcriptional Repression Activity of the Androgen Receptor in Prostate Cancer. Cancer Res 2019; 79:5260-5271. [PMID: 31444154 PMCID: PMC6801099 DOI: 10.1158/0008-5472.can-19-0815] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 07/09/2019] [Accepted: 08/20/2019] [Indexed: 01/15/2023]
Abstract
Loss of expression of context-specific tumor suppressors is a critical event that facilitates the development of prostate cancer. Zinc finger and BTB domain containing transcriptional repressors, such as ZBTB7A and ZBTB16, have been recently identified as tumor suppressors that play important roles in preventing prostate cancer progression. In this study, we used combined ChIP-seq and RNA-seq analyses of prostate cancer cells to identify direct ZBTB7A-repressed genes, which are enriched for transcriptional targets of E2F, and identified that the androgen receptor (AR) played a critical role in the transcriptional suppression of these E2F targets. AR recruitment of the retinoblastoma protein (Rb) was required to strengthen the E2F-Rb transcriptional repression complex. In addition, ZBTB7A was rapidly recruited to the E2F-Rb binding sites by AR and negatively regulated the transcriptional activity of E2F1 on DNA replication genes. Finally, ZBTB7A suppressed the growth of castration-resistant prostate cancer (CRPC) in vitro and in vivo, and overexpression of ZBTB7A acted in synergy with high-dose testosterone treatment to effectively prevent the recurrence of CRPC. Overall, this study provides novel molecular insights of the role of ZBTB7A in CRPC cells and demonstrates globally its critical role in mediating the transcriptional repression activity of AR. SIGNIFICANCE: ZBTB7A is recruited to the E2F-Rb binding sites by AR and negatively regulates the transcriptional activity of E2F1 on DNA replication genes.
Collapse
Affiliation(s)
- Dong Han
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, Massachusetts
| | - Sujun Chen
- Princess Margaret Cancer Center/University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Wanting Han
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, Massachusetts
| | - Shuai Gao
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, Massachusetts
| | - Jude N Owiredu
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, Massachusetts
| | - Muqing Li
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, Massachusetts
| | - Steven P Balk
- Beth Israel Deaconess Medical Center, Boston, Massachusetts
| | - Housheng Hansen He
- Princess Margaret Cancer Center/University Health Network, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Changmeng Cai
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, Massachusetts.
| |
Collapse
|
4
|
Castro-Mondragon JA, Jaeger S, Thieffry D, Thomas-Chollier M, van Helden J. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections. Nucleic Acids Res 2017; 45:e119. [PMID: 28591841 PMCID: PMC5737723 DOI: 10.1093/nar/gkx314] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 06/04/2017] [Indexed: 01/08/2023] Open
Abstract
Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines.
Collapse
Affiliation(s)
| | | | - Denis Thieffry
- IBENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France
| | - Morgane Thomas-Chollier
- IBENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France
| | - Jacques van Helden
- Aix Marseille Univ, INSERM, TAGC, Theory and Approaches of Genomic Complexity, UMR_S 1090, Marseille, France
| |
Collapse
|
5
|
Bassani-Sternberg M, Chong C, Guillaume P, Solleder M, Pak H, Gannon PO, Kandalaft LE, Coukos G, Gfeller D. Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity. PLoS Comput Biol 2017; 13:e1005725. [PMID: 28832583 PMCID: PMC5584980 DOI: 10.1371/journal.pcbi.1005725] [Citation(s) in RCA: 165] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Revised: 09/05/2017] [Accepted: 08/17/2017] [Indexed: 01/01/2023] Open
Abstract
The precise identification of Human Leukocyte Antigen class I (HLA-I) binding motifs plays a central role in our ability to understand and predict (neo-)antigen presentation in infectious diseases and cancer. Here, by exploiting co-occurrence of HLA-I alleles across ten newly generated as well as forty public HLA peptidomics datasets comprising more than 115,000 unique peptides, we show that we can rapidly and accurately identify many HLA-I binding motifs and map them to their corresponding alleles without any a priori knowledge of HLA-I binding specificity. Our approach recapitulates and refines known motifs for 43 of the most frequent alleles, uncovers new motifs for 9 alleles that up to now had less than five known ligands and provides a scalable framework to incorporate additional HLA peptidomics studies in the future. The refined motifs improve neo-antigen and cancer testis antigen predictions, indicating that unbiased HLA peptidomics data are ideal for in silico predictions of neo-antigens from tumor exome sequencing data. The new motifs further reveal distant modulation of the binding specificity at P2 for some HLA-I alleles by residues in the HLA-I binding site but outside of the B-pocket and we unravel the underlying mechanisms by protein structure analysis, mutagenesis and in vitro binding assays. Predicting the differences between cancer and normal cells that are visible to the immune system is of central importance for cancer immunotherapy. Here we introduce a novel computational framework to harness the wealth of data from in-depth HLA peptidomics studies, including ten novel high quality (<1% FDR) datasets generated for this work, to improve predictions of peptides displayed on HLA-I molecules. These high-throughput and unbiased data enable us to refine models of HLA-I binding specificity for many alleles (including some that had no ligand until this study) and improve predictions of neo-antigens from exome sequencing data in melanoma and lung cancer samples. Moreover, the refined description of HLA-I binding specificity reveals cases of allosteric modulation of HLA-I binding specificity at the second amino acid position (P2) of their ligands by residues that are part of the HLA-I binding site but outside of the B pocket.
Collapse
Affiliation(s)
- Michal Bassani-Sternberg
- Ludwig Centre for Cancer Research, University of Lausanne, Epalinges, Switzerland
- Department of Fundamental Oncology, University Hospital of Lausanne, Lausanne, Switzerland
- * E-mail: (DG); (MBS)
| | - Chloé Chong
- Ludwig Centre for Cancer Research, University of Lausanne, Epalinges, Switzerland
- Department of Fundamental Oncology, University Hospital of Lausanne, Lausanne, Switzerland
| | - Philippe Guillaume
- Ludwig Centre for Cancer Research, University of Lausanne, Epalinges, Switzerland
- Department of Fundamental Oncology, University Hospital of Lausanne, Lausanne, Switzerland
| | - Marthe Solleder
- Ludwig Centre for Cancer Research, University of Lausanne, Epalinges, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - HuiSong Pak
- Ludwig Centre for Cancer Research, University of Lausanne, Epalinges, Switzerland
- Department of Fundamental Oncology, University Hospital of Lausanne, Lausanne, Switzerland
| | - Philippe O. Gannon
- Department of Fundamental Oncology, University Hospital of Lausanne, Lausanne, Switzerland
| | - Lana E. Kandalaft
- Ludwig Centre for Cancer Research, University of Lausanne, Epalinges, Switzerland
- Department of Fundamental Oncology, University Hospital of Lausanne, Lausanne, Switzerland
| | - George Coukos
- Ludwig Centre for Cancer Research, University of Lausanne, Epalinges, Switzerland
- Department of Fundamental Oncology, University Hospital of Lausanne, Lausanne, Switzerland
| | - David Gfeller
- Ludwig Centre for Cancer Research, University of Lausanne, Epalinges, Switzerland
- Department of Fundamental Oncology, University Hospital of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
- * E-mail: (DG); (MBS)
| |
Collapse
|
6
|
Tran NTL, Huang CH. Cloud-based MOTIFSIM: Detecting Similarity in Large DNA Motif Data Sets. J Comput Biol 2017; 24:450-459. [DOI: 10.1089/cmb.2016.0080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ngoc Tam L. Tran
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut
| | - Chun-Hsi Huang
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut
| |
Collapse
|
7
|
Hogan GJ, Brown PO, Herschlag D. Evolutionary Conservation and Diversification of Puf RNA Binding Proteins and Their mRNA Targets. PLoS Biol 2015; 13:e1002307. [PMID: 26587879 PMCID: PMC4654594 DOI: 10.1371/journal.pbio.1002307] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2015] [Accepted: 10/23/2015] [Indexed: 12/31/2022] Open
Abstract
Reprogramming of a gene’s expression pattern by acquisition and loss of sequences recognized by specific regulatory RNA binding proteins may be a major mechanism in the evolution of biological regulatory programs. We identified that RNA targets of Puf3 orthologs have been conserved over 100–500 million years of evolution in five eukaryotic lineages. Focusing on Puf proteins and their targets across 80 fungi, we constructed a parsimonious model for their evolutionary history. This model entails extensive and coordinated changes in the Puf targets as well as changes in the number of Puf genes and alterations of RNA binding specificity including that: 1) Binding of Puf3 to more than 200 RNAs whose protein products are predominantly involved in the production and organization of mitochondrial complexes predates the origin of budding yeasts and filamentous fungi and was maintained for 500 million years, throughout the evolution of budding yeast. 2) In filamentous fungi, remarkably, more than 150 of the ancestral Puf3 targets were gained by Puf4, with one lineage maintaining both Puf3 and Puf4 as regulators and a sister lineage losing Puf3 as a regulator of these RNAs. The decrease in gene expression of these mRNAs upon deletion of Puf4 in filamentous fungi (N. crassa) in contrast to the increase upon Puf3 deletion in budding yeast (S. cerevisiae) suggests that the output of the RNA regulatory network is different with Puf4 in filamentous fungi than with Puf3 in budding yeast. 3) The coregulated Puf4 target set in filamentous fungi expanded to include mitochondrial genes involved in the tricarboxylic acid (TCA) cycle and other nuclear-encoded RNAs with mitochondrial function not bound by Puf3 in budding yeast, observations that provide additional evidence for substantial rewiring of post-transcriptional regulation. 4) Puf3 also expanded and diversified its targets in filamentous fungi, gaining interactions with the mRNAs encoding the mitochondrial electron transport chain (ETC) complex I as well as hundreds of other mRNAs with nonmitochondrial functions. The many concerted and conserved changes in the RNA targets of Puf proteins strongly support an extensive role of RNA binding proteins in coordinating gene expression, as originally proposed by Keene. Rewiring of Puf-coordinated mRNA targets and transcriptional control of the same genes occurred at different points in evolution, suggesting that there have been distinct adaptations via RNA binding proteins and transcription factors. The changes in Puf targets and in the Puf proteins indicate an integral involvement of RNA binding proteins and their RNA targets in the adaptation, reprogramming, and function of gene expression. A map of the evolutionary history of Puf proteins and their RNA targets shows that reprogramming of global gene expression programs via adaptive mutations that affect protein-RNA interactions is an important source of biological diversity. We set out to trace the evolutionary history of an RNA binding protein and how its interactions with targets change over evolution. Identifying this natural history is a step toward understanding the critical differences between organisms and how gene expression programs are rewired during evolution. Using bioinformatics and experimental approaches, we broadly surveyed the evolution of binding targets of a particular family of RNA binding proteins—the Puf proteins, whose protein sequences and target RNA sequences are relatively well-characterized—across 99 eukaryotic species. We found five groups of species in which targets have been conserved for at least 100 million years and then took advantage of genome sequences from a large number of fungal species to deeply investigate the conservation and changes in Puf proteins and their RNA targets. Our analyses identified multiple and extensive reconfigurations during the natural history of fungi and suggest that RNA binding proteins and their RNA targets are profoundly involved in evolutionary reprogramming of gene expression and help define distinct programs unique to each organism. Continuing to uncover the natural history of RNA binding proteins and their interactions will provide a unique window into the gene expression programs of present day species and point to new ways to engineer gene expression programs.
Collapse
Affiliation(s)
- Gregory J. Hogan
- Department of Biochemistry, Stanford University School of Medicine, Stanford, California, United States of America
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, California, United States of America
| | - Patrick O. Brown
- Department of Biochemistry, Stanford University School of Medicine, Stanford, California, United States of America
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, California, United States of America
- * E-mail: (POB); (DH)
| | - Daniel Herschlag
- Department of Biochemistry, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Chemistry, Stanford University, Stanford, California, United States of America
- Department of Chemical Engineering, Stanford University, Stanford, California, United States of America
- ChEM-H Institute, Stanford University, Stanford, California, United States of America
- * E-mail: (POB); (DH)
| |
Collapse
|
8
|
MOTIFSIM: A web tool for detecting similarity in multiple DNA motif datasets. Biotechniques 2015; 59:26-33. [PMID: 26156781 DOI: 10.2144/000114308] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 05/04/2015] [Indexed: 11/23/2022] Open
Abstract
Currently, there are a number of motif detection tools available that possess unique functionality. These tools often report different motifs, and therefore use of multiple tools is generally advised since common motifs reported by multiple tools are more likely to be biologically significant. However, results produced by these different tools need to be compared and existing similarity detection tools only allow comparison between two data sets. Here, we describe a motif similarity detection tool (MOTIFSIM) possessing a web-based, user-friendly interface that is capable of detecting similarity from multiple DNA motif data sets concurrently. Results can either be viewed online or downloaded. Users may also download and run MOTIFSIM as a command-line tool in stand-alone mode. The web tool, along with its command-line version, user manuals, and source codes, are freely available at http://biogrid-head.engr.uconn.edu/motifsim/.
Collapse
|
9
|
Wang S, Sun H, Ma J, Zang C, Wang C, Wang J, Tang Q, Meyer CA, Zhang Y, Liu XS. Target analysis by integration of transcriptome and ChIP-seq data with BETA. Nat Protoc 2013; 8:2502-15. [PMID: 24263090 DOI: 10.1038/nprot.2013.150] [Citation(s) in RCA: 374] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The combination of ChIP-seq and transcriptome analysis is a compelling approach to unravel the regulation of gene expression. Several recently published methods combine transcription factor (TF) binding and gene expression for target prediction, but few of them provide an efficient software package for the community. Binding and expression target analysis (BETA) is a software package that integrates ChIP-seq of TFs or chromatin regulators with differential gene expression data to infer direct target genes. BETA has three functions: (i) to predict whether the factor has activating or repressive function; (ii) to infer the factor's target genes; and (iii) to identify the motif of the factor and its collaborators, which might modulate the factor's activating or repressive function. Here we describe the implementation and features of BETA to demonstrate its application to several data sets. BETA requires ~1 GB of RAM, and the procedure takes 20 min to complete. BETA is available open source at http://cistrome.org/BETA/.
Collapse
Affiliation(s)
- Su Wang
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Persikov AV, Singh M. De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res 2013; 42:97-108. [PMID: 24097433 PMCID: PMC3874201 DOI: 10.1093/nar/gkt890] [Citation(s) in RCA: 146] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Proteins with sequence-specific DNA binding function are important for a wide range of biological activities. De novo prediction of their DNA-binding specificities from sequence alone would be a great aid in inferring cellular networks. Here we introduce a method for predicting DNA-binding specificities for Cys2His2 zinc fingers (C2H2-ZFs), the largest family of DNA-binding proteins in metazoans. We develop a general approach, based on empirical calculations of pairwise amino acid–nucleotide interaction energies, for predicting position weight matrices (PWMs) representing DNA-binding specificities for C2H2-ZF proteins. We predict DNA-binding specificities on a per-finger basis and merge predictions for C2H2-ZF domains that are arrayed within sequences. We test our approach on a diverse set of natural C2H2-ZF proteins with known binding specificities and demonstrate that for >85% of the proteins, their predicted PWMs are accurate in 50% of their nucleotide positions. For proteins with several zinc finger isoforms, we show via case studies that this level of accuracy enables us to match isoforms with their known DNA-binding specificities. A web server for predicting a PWM given a protein containing C2H2-ZF domains is available online at http://zf.princeton.edu and can be used to aid in protein engineering applications and in genome-wide searches for transcription factor targets.
Collapse
Affiliation(s)
- Anton V Persikov
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton NJ 08544, USA and Department of Computer Science, Princeton University, Princeton NJ 08544, USA
| | | |
Collapse
|
11
|
Vorontsov IE, Kulakovskiy IV, Makeev VJ. Jaccard index based similarity measure to compare transcription factor binding site models. Algorithms Mol Biol 2013; 8:23. [PMID: 24074225 PMCID: PMC3851813 DOI: 10.1186/1748-7188-8-23] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model.TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs. This is also common for different TFs from the same structural family. Thus it is often necessary to measure the similarity between PWMs. The popular tools compare PWMs directly using matrix elements. Yet, for log-odds PWMs, negative elements do not contribute to the scores of highly scoring TFBS and thus may be different without affecting the sets of the best recognized binding sites. Moreover, the two TFBS sets recognized by a given pair of PWMs can be more or less different depending on the score thresholds. RESULTS We propose a practical approach for comparing two TFBS models, each consisting of a PWM and the respective scoring threshold. The proposed measure is a variant of the Jaccard index between two TFBS sets. The measure defines a metric space for TFBS models of all finite lengths. The algorithm can compare TFBS models constructed using substantially different approaches, like PWMs with raw positional counts and log-odds. We present the efficient software implementation: MACRO-APE (MAtrix CompaRisOn by Approximate P-value Estimation). CONCLUSIONS MACRO-APE can be effectively used to compute the Jaccard index based similarity for two TFBS models. A two-pass scanning algorithm is presented to scan a given collection of PWMs for PWMs similar to a given query. AVAILABILITY AND IMPLEMENTATION MACRO-APE is implemented in ruby 1.9; software including source code and a manual is freely available at http://autosome.ru/macroape/ and in supplementary materials.
Collapse
|
12
|
Nandi S, Blais A, Ioshikhes I. Identification of cis-regulatory modules in promoters of human genes exploiting mutual positioning of transcription factors. Nucleic Acids Res 2013; 41:8822-41. [PMID: 23913413 PMCID: PMC3799424 DOI: 10.1093/nar/gkt578] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
In higher organisms, gene regulation is controlled by the interplay of non-random combinations of multiple transcription factors (TFs). Although numerous attempts have been made to identify these combinations, important details, such as mutual positioning of the factors that have an important role in the TF interplay, are still missing. The goal of the present work is in silico mapping of some of such associating factors based on their mutual positioning, using computational screening. We have selected the process of myogenesis as a study case, and we focused on TF combinations involving master myogenic TF Myogenic differentiation (MyoD) with other factors situated at specific distances from it. The results of our work show that some muscle-specific factors occur together with MyoD within the range of ±100 bp in a large number of promoters. We confirm co-occurrence of the MyoD with muscle-specific factors as described in earlier studies. However, we have also found novel relationships of MyoD with other factors not specific for muscle. Additionally, we have observed that MyoD tends to associate with different factors in proximal and distal promoter areas. The major outcome of our study is establishing the genome-wide connection between biological interactions of TFs and close co-occurrence of their binding sites.
Collapse
Affiliation(s)
- Soumyadeep Nandi
- Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, Ontario K1H 8M5, Canada and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario K1H 8M5, Canada
| | | | | |
Collapse
|
13
|
Stegmaier P, Kel A, Wingender E, Borlak J. A discriminative approach for unsupervised clustering of DNA sequence motifs. PLoS Comput Biol 2013; 9:e1002958. [PMID: 23555204 PMCID: PMC3605052 DOI: 10.1371/journal.pcbi.1002958] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Accepted: 01/15/2013] [Indexed: 12/03/2022] Open
Abstract
Algorithmic comparison of DNA sequence motifs is a problem in bioinformatics that has received increased attention during the last years. Its main applications concern characterization of potentially novel motifs and clustering of a motif collection in order to remove redundancy. Despite growing interest in motif clustering, the question which motif clusters to aim at has so far not been systematically addressed. Here we analyzed motif similarities in a comprehensive set of vertebrate transcription factor classes. For this we developed enhanced similarity scores by inclusion of the information coverage (IC) criterion, which evaluates the fraction of information an alignment covers in aligned motifs. A network-based method enabled us to identify motif clusters with high correspondence to DNA-binding domain phylogenies and prior experimental findings. Based on this analysis we derived a set of motif families representing distinct binding specificities. These motif families were used to train a classifier which was further integrated into a novel algorithm for unsupervised motif clustering. Application of the new algorithm demonstrated its superiority to previously published methods and its ability to reproduce entrained motif families. As a result, our work proposes a probabilistic approach to decide whether two motifs represent common or distinct binding specificities. Transcription factors play a central role in the regulation of gene expression. Their interaction with specific elements in the DNA mediates dynamic changes in transcriptional activity. Databases store a growing number of known DNA sequence patterns, also denoted as DNA sequence motifs that are recognized by transcription factors. Such databases can be searched to find a match for a newly discovered pattern and that way identify the potential binding factor. It is also of interest to cluster motifs in order to examine which transcription factors have similar binding properties and, thus, may promiscuously bind to each other's sites, or how many distinct specificities have been described. To gain deeper insight into the similarities between DNA sequence motifs, we analyzed a comprehensive set of known motifs. For this purpose we devised a network-based approach that enabled us to identify clusters of related motifs that largely coincided with grouping of related TFs on the basis of protein similarity. On the basis of these results, we were able to predict whether two motifs belong to the same subgroup and constructed a novel, fully-automated method for motif clustering, which enables users to assess the similarity of a newly found motif with all known motifs in the collection.
Collapse
|
14
|
Ibrahim F, Maragkakis M, Alexiou P, Maronski MA, Dichter MA, Mourelatos Z. Identification of in vivo, conserved, TAF15 RNA binding sites reveals the impact of TAF15 on the neuronal transcriptome. Cell Rep 2013; 3:301-8. [PMID: 23416048 PMCID: PMC3594071 DOI: 10.1016/j.celrep.2013.01.021] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2012] [Revised: 12/13/2012] [Accepted: 01/16/2013] [Indexed: 12/13/2022] Open
Abstract
RNA binding proteins (RBPs) have emerged as major causative agents of amyotrophic lateral sclerosis (ALS). To investigate the function of TAF15, an RBP recently implicated in ALS, we explored its target RNA repertoire in normal human brain and mouse neurons. Coupling high-throughput sequencing of immunoprecipitated and crosslinked RNA with RNA sequencing and TAF15 knockdowns, we identified conserved TAF15 RNA targets and assessed the impact of TAF15 on the neuronal transcriptome. We describe a role of TAF15 in the regulation of splicing for a set of neuronal RNAs encoding proteins with essential roles in synaptic activities. We find that TAF15 is required for a critical alternative splicing event of the zeta-1 subunit of the glutamate N-methyl-D-aspartate receptor (Grin1) that controls the activity and trafficking of NR1. Our study uncovers neuronal RNA networks impacted by TAF15 and sets the stage for investigating the role of TAF15 in ALS pathogenesis.
Collapse
Affiliation(s)
- Fadia Ibrahim
- Department of Pathology and Laboratory Medicine, Division of Neuropathology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Manolis Maragkakis
- Department of Pathology and Laboratory Medicine, Division of Neuropathology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Panagiotis Alexiou
- Department of Pathology and Laboratory Medicine, Division of Neuropathology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Margaret A. Maronski
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Marc A. Dichter
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- Mahoney Institute of Neurological Sciences, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Zissimos Mourelatos
- Department of Pathology and Laboratory Medicine, Division of Neuropathology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- PENN Genome Frontiers Institute, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
15
|
Seitzer P, Wilbanks EG, Larsen DJ, Facciotti MT. A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs. BMC Bioinformatics 2012. [PMID: 23181585 PMCID: PMC3542263 DOI: 10.1186/1471-2105-13-317] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. RESULTS We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. CONCLUSIONS Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at http://www.bme.ucdavis.edu/facciotti/resources_data/software/.
Collapse
Affiliation(s)
- Phillip Seitzer
- Department of Biomedical Engineering, One Shields Ave, University of California, Davis, CA 95616, USA
| | | | | | | |
Collapse
|
16
|
Chan TM, Leung KS, Lee KH, Wong MH, Lau TCK, Tsui SKW. Subtypes of associated protein-DNA (Transcription Factor-Transcription Factor Binding Site) patterns. Nucleic Acids Res 2012; 40:9392-403. [PMID: 22904079 PMCID: PMC3479201 DOI: 10.1093/nar/gks749] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
In protein–DNA interactions, particularly transcription factor (TF) and transcription factor binding site (TFBS) bindings, associated residue variations form patterns denoted as subtypes. Subtypes may lead to changed binding preferences, distinguish conserved from flexible binding residues and reveal novel binding mechanisms. However, subtypes must be studied in the context of core bindings. While solving 3D structures would require huge experimental efforts, recent sequence-based associated TF-TFBS pattern discovery has shown to be promising, upon which a large-scale subtype study is possible and desirable. In this article, we investigate residue-varying subtypes based on associated TF-TFBS patterns. By re-categorizing the patterns with respect to varying TF amino acids, statistically significant (P values ≤ 0.005) subtypes leading to varying TFBS patterns are discovered without using TF family or domain annotations. Resultant subtypes have various biological meanings. The subtypes reflect familial and functional properties and exhibit changed binding preferences supported by 3D structures. Conserved residues critical for maintaining TF-TFBS bindings are revealed by analyzing the subtypes. In-depth analysis on the subtype pair PKVVIL-CACGTG versus PKVEIL-CAGCTG shows the V/E variation is indicative for distinguishing Myc from MRF families. Discovered from sequences only, the TF-TFBS subtypes are informative and promising for more biological findings, complementing and extending recent one-sided subtype and familial studies with comprehensive evidence.
Collapse
Affiliation(s)
- Tak-Ming Chan
- Department of Computer Science & Engineering, The Chinese University of Hong Kong, Shatin, N T, Hong Kong.
| | | | | | | | | | | |
Collapse
|
17
|
Claeys M, Storms V, Sun H, Michoel T, Marchal K. MotifSuite: workflow for probabilistic motif detection and assessment. Bioinformatics 2012; 28:1931-2. [DOI: 10.1093/bioinformatics/bts293] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
18
|
Chowdhary R, Tan SL, Pavesi G, Jin J, Dong D, Mathur SK, Burkart A, Narang V, Glurich I, Raby BA, Weiss ST, Wong L, Liu JS, Bajic VB. A database of annotated promoters of genes associated with common respiratory and related diseases. Am J Respir Cell Mol Biol 2012; 47:112-9. [PMID: 22383585 DOI: 10.1165/rcmb.2011-0419oc] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Many genes have been implicated in the pathogenesis of common respiratory and related diseases (RRDs), yet the underlying mechanisms are largely unknown. Differential gene expression patterns in diseased and healthy individuals suggest that RRDs affect or are affected by modified transcription regulation programs. It is thus crucial to characterize implicated genes in terms of transcriptional regulation. For this purpose, we conducted a promoter analysis of genes associated with 11 common RRDs including allergic rhinitis, asthma, bronchiectasis, bronchiolitis, bronchitis, chronic obstructive pulmonary disease, cystic fibrosis, emphysema, eczema, psoriasis, and urticaria, many of which are thought to be genetically related. The objective of the present study was to obtain deeper insight into the transcriptional regulation of these disease-associated genes by annotating their promoter regions with transcription factors (TFs) and TF binding sites (TFBSs). We discovered many TFs that are significantly enriched in the target disease groups including associations that have been documented in the literature. We also identified a number of putative TFs/TFBSs that appear to be novel. The results of our analysis are provided in an online database that is freely accessible to researchers at http://www.respiratorygenomics.com. Promoter-associated TFBS information and related genomic features, such as histone modification sites, microsatellites, CpG islands, and SNPs, are graphically summarized in the database. Users can compare and contrast underlying mechanisms of specific RRDs relative to candidate genes, TFs, gene ontology terms, micro-RNAs, and biological pathways for the conduct of metaanalyses. This database represents a novel, useful resource for RRD researchers.
Collapse
Affiliation(s)
- Rajesh Chowdhary
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield Clinic, Wisconsin 54449, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Habib N, Wapinski I, Margalit H, Regev A, Friedman N. A functional selection model explains evolutionary robustness despite plasticity in regulatory networks. Mol Syst Biol 2012; 8:619. [PMID: 23089682 PMCID: PMC3501536 DOI: 10.1038/msb.2012.50] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Accepted: 08/29/2012] [Indexed: 11/09/2022] Open
Abstract
Evolutionary rewiring of regulatory networks is an important source of diversity among species. Previous evidence suggested substantial divergence of regulatory networks across species. However, systematically assessing the extent of this plasticity and its functional implications has been challenging due to limited experimental data and the noisy nature of computational predictions. Here, we introduce a novel approach to study cis-regulatory evolution, and use it to trace the regulatory history of 88 DNA motifs of transcription factors across 23 Ascomycota fungi. While motifs are conserved, we find a pervasive gain and loss in the regulation of their target genes. Despite this turnover, the biological processes associated with a motif are generally conserved. We explain these trends using a model with a strong selection to conserve the overall function of a transcription factor, and a much weaker selection over the specific genes it targets. The model also accounts for the turnover of bound targets measured experimentally across species in yeasts and mammals. Thus, selective pressures on regulatory networks mostly tolerate local rewiring, and may allow for subtle fine-tuning of gene regulation during evolution.
Collapse
Affiliation(s)
- Naomi Habib
- School of Computer Science and Engineering, Hebrew University, Jerusalem, Israel
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University, Jerusalem, Israel
- Alexander Silberman Institute of Life Sciences, Hebrew University, Jerusalem, Israel
| | - Ilan Wapinski
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute, 7 Cambridge Center, Cambridge, MA, USA
| | - Hanah Margalit
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University, Jerusalem, Israel
| | - Aviv Regev
- Broad Institute, 7 Cambridge Center, Cambridge, MA, USA
- Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Nir Friedman
- School of Computer Science and Engineering, Hebrew University, Jerusalem, Israel
- Alexander Silberman Institute of Life Sciences, Hebrew University, Jerusalem, Israel
| |
Collapse
|
20
|
Vella P, Barozzi I, Cuomo A, Bonaldi T, Pasini D. Yin Yang 1 extends the Myc-related transcription factors network in embryonic stem cells. Nucleic Acids Res 2011; 40:3403-18. [PMID: 22210892 PMCID: PMC3333890 DOI: 10.1093/nar/gkr1290] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The Yin Yang 1 (YY1) transcription factor is a master regulator of development, essential for early embryogenesis and adult tissues formation. YY1 is the mammalian orthologue of Pleiohomeotic, one of the transcription factors that binds Polycomb DNA response elements in Drosophila melanogaster and mediates Polycomb group proteins (PcG) recruitment to DNA. Despite several publications pointing at YY1 having a similar role in mammalians, others showed features of YY1 that are not compatible with PcG functions. Here, we show that, in mouse Embryonic Stem (ES) cells, YY1 has genome-wide PcG-independent activities while it is still stably associated with the INO80 chromatin-remodeling complex, as well as with novel RNA helicase activities. YY1 binds chromatin in close proximity of the transcription start site of highly expressed genes. Loss of YY1 functions preferentially led to a down-regulation of target genes expression, as well as to an up-regulation of several small non-coding RNAs, suggesting a role for YY1 in regulating small RNA biogenesis. Finally, we found that YY1 is a novel player of Myc-related transcription factors and that its coordinated binding at promoters potentiates gene expression, proposing YY1 as an active component of the Myc transcription network that links ES to cancer cells.
Collapse
Affiliation(s)
- Pietro Vella
- Department of Experimental Oncology, European Institute of Oncology (IEO), Via Adamello 16, 20139 Milan, Italy
| | | | | | | | | |
Collapse
|
21
|
Tanaka E, Bailey T, Grant CE, Noble WS, Keich U. Improved similarity scores for comparing motifs. ACTA ACUST UNITED AC 2011; 27:1603-9. [PMID: 21543443 DOI: 10.1093/bioinformatics/btr257] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION A question that often comes up after applying a motif finder to a set of co-regulated DNA sequences is whether the reported putative motif is similar to any known motif. While several tools have been designed for this task, Habib et al. pointed out that the scores that are commonly used for measuring similarity between motifs do not distinguish between a good alignment of two informative columns (say, all-A) and one of two uninformative columns. This observation explains why tools such as Tomtom occasionally return an alignment of uninformative columns which is clearly spurious. To address this problem, Habib et al. suggested a new score [Bayesian Likelihood 2-Component (BLiC)] which uses a Bayesian information criterion to penalize matches that are also similar to the background distribution. RESULTS We show that the BLiC score exhibits other, highly undesirable properties, and we offer instead a general approach to adjust any motif similarity score so as to reduce the number of reported spurious alignments of uninformative columns. We implement our method in Tomtom and show that, without significantly compromising Tomtom's retrieval accuracy or its runtime, we can drastically reduce the number of uninformative alignments. AVAILABILITY AND IMPLEMENTATION The modified Tomtom is available as part of the MEME Suite at http://meme.nbcr.net.
Collapse
Affiliation(s)
- Emi Tanaka
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW Australia.
| | | | | | | | | |
Collapse
|
22
|
Yanover C, Bradley P. Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers. Nucleic Acids Res 2011; 39:4564-76. [PMID: 21343182 PMCID: PMC3113574 DOI: 10.1093/nar/gkr048] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning. The ability to predict the DNA binding preferences of these regulatory proteins from their amino acid sequence would greatly aid in reconstruction of their regulatory interactions. Structural modeling provides one route to such predictions: by building accurate molecular models of regulatory proteins in complex with candidate binding sites, and estimating their relative binding affinities for these sites using a suitable potential function, it should be possible to construct DNA binding profiles. Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes. The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences. We apply the algorithm to predict binding profiles for a benchmark set of eleven C2H2 zinc finger transcription factors, five of known and six of unknown structure. The predicted profiles are in good agreement with experimental binding data; furthermore, examination of the modeled structures gives insight into observed binding preferences.
Collapse
Affiliation(s)
- Chen Yanover
- Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA
| | | |
Collapse
|
23
|
Novershtern N, Subramanian A, Lawton LN, Mak RH, Haining WN, McConkey ME, Habib N, Yosef N, Chang CY, Shay T, Frampton GM, Drake ACB, Leskov I, Nilsson B, Preffer F, Dombkowski D, Evans JW, Liefeld T, Smutko JS, Chen J, Friedman N, Young RA, Golub TR, Regev A, Ebert BL. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell 2011; 144:296-309. [PMID: 21241896 PMCID: PMC3049864 DOI: 10.1016/j.cell.2011.01.004] [Citation(s) in RCA: 717] [Impact Index Per Article: 51.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2010] [Revised: 10/18/2010] [Accepted: 01/04/2011] [Indexed: 01/19/2023]
Abstract
Though many individual transcription factors are known to regulate hematopoietic differentiation, major aspects of the global architecture of hematopoiesis remain unknown. Here, we profiled gene expression in 38 distinct purified populations of human hematopoietic cells and used probabilistic models of gene expression and analysis of cis-elements in gene promoters to decipher the general organization of their regulatory circuitry. We identified modules of highly coexpressed genes, some of which are restricted to a single lineage but most of which are expressed at variable levels across multiple lineages. We found densely interconnected cis-regulatory circuits and a large number of transcription factors that are differentially expressed across hematopoietic states. These findings suggest a more complex regulatory system for hematopoiesis than previously assumed.
Collapse
Affiliation(s)
- Noa Novershtern
- Broad Institute, 7 Cambridge Center, Cambridge MA, 02142
- Department of Biology, Massachusetts Institute of Technology, Cambridge MA, 02140
- School of Computer Science, Hebrew University, Jerusalem, Israel
| | | | - Lee N. Lawton
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142
| | - Raymond H. Mak
- Broad Institute, 7 Cambridge Center, Cambridge MA, 02142
| | | | | | - Naomi Habib
- School of Computer Science, Hebrew University, Jerusalem, Israel
| | - Nir Yosef
- Broad Institute, 7 Cambridge Center, Cambridge MA, 02142
| | - Cindy Y. Chang
- Broad Institute, 7 Cambridge Center, Cambridge MA, 02142
- Brigham and Women's Hospital, Boston, MA 02115
| | - Tal Shay
- Broad Institute, 7 Cambridge Center, Cambridge MA, 02142
| | - Garrett M. Frampton
- Department of Biology, Massachusetts Institute of Technology, Cambridge MA, 02140
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142
| | - Adam C. B. Drake
- Department of Biology, Massachusetts Institute of Technology, Cambridge MA, 02140
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Ilya Leskov
- Department of Biology, Massachusetts Institute of Technology, Cambridge MA, 02140
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Bjorn Nilsson
- Broad Institute, 7 Cambridge Center, Cambridge MA, 02142
- Brigham and Women's Hospital, Boston, MA 02115
| | - Fred Preffer
- Massachusetts General Hospital, Boston, MA 02114
| | | | | | - Ted Liefeld
- Broad Institute, 7 Cambridge Center, Cambridge MA, 02142
| | | | - Jianzhu Chen
- Department of Biology, Massachusetts Institute of Technology, Cambridge MA, 02140
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Nir Friedman
- School of Computer Science, Hebrew University, Jerusalem, Israel
| | - Richard A. Young
- Department of Biology, Massachusetts Institute of Technology, Cambridge MA, 02140
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142
| | - Todd R. Golub
- Broad Institute, 7 Cambridge Center, Cambridge MA, 02142
- Dana-Farber Cancer Institute, Boston, MA 02115
- Howard Hughes Medical Institute
| | - Aviv Regev
- Broad Institute, 7 Cambridge Center, Cambridge MA, 02142
- Department of Biology, Massachusetts Institute of Technology, Cambridge MA, 02140
- Howard Hughes Medical Institute
| | - Benjamin L. Ebert
- Broad Institute, 7 Cambridge Center, Cambridge MA, 02142
- Dana-Farber Cancer Institute, Boston, MA 02115
- Brigham and Women's Hospital, Boston, MA 02115
| |
Collapse
|
24
|
King CA, Bradley P. Structure-based prediction of protein-peptide specificity in Rosetta. Proteins 2010; 78:3437-49. [PMID: 20954182 DOI: 10.1002/prot.22851] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2010] [Revised: 07/16/2010] [Accepted: 07/28/2010] [Indexed: 01/03/2023]
Abstract
Protein-peptide interactions mediate many of the connections in intracellular signaling networks. A generalized computational framework for atomically precise modeling of protein-peptide specificity may allow for predicting molecular interactions, anticipating the effects of drugs and genetic mutations, and redesigning molecules for new interactions. We have developed an extensible, general algorithm for structure-based prediction of protein-peptide specificity as part of the Rosetta molecular modeling package. The algorithm is not restricted to any one peptide-binding domain family and, at minimum, does not require an experimentally characterized structure of the target protein nor any information about sequence specificity; although known structural data can be incorporated when available to improve performance. We demonstrate substantial success in specificity prediction across a diverse set of peptide-binding proteins, and show how performance is affected when incorporating varying degrees of input structural data. We also illustrate how structure-based approaches can provide atomic-level insight into mechanisms of peptide recognition and can predict the effects of point mutations on peptide specificity. Shortcomings and artifacts of our benchmark predictions are explained and limits on the generality of the method are explored. This work provides a promising foundation upon which further development of completely generalized, de novo prediction of peptide specificity may progress.
Collapse
Affiliation(s)
- Christopher A King
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.
| | | |
Collapse
|
25
|
Zaslavsky E, Bradley P, Yanover C. Inferring PDZ domain multi-mutant binding preferences from single-mutant data. PLoS One 2010; 5:e12787. [PMID: 20976110 PMCID: PMC2956758 DOI: 10.1371/journal.pone.0012787] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2010] [Accepted: 05/04/2010] [Indexed: 11/19/2022] Open
Abstract
Many important cellular protein interactions are mediated by peptide recognition domains. The ability to predict a domain's binding specificity directly from its primary sequence is essential to understanding the complexity of protein-protein interaction networks. One such recognition domain is the PDZ domain, functioning in scaffold proteins that facilitate formation of signaling networks. Predicting the PDZ domain's binding specificity was a part of the DREAM4 Peptide Recognition Domain challenge, the goal of which was to describe, as position weight matrices, the specificity profiles of five multi-mutant ERBB2IP-1 domains. We developed a method that derives multi-mutant binding preferences by generalizing the effects of single point mutations on the wild type domain's binding specificities. Our approach, trained on publicly available ERBB2IP-1 single-mutant phage display data, combined linear regression-based prediction for ligand positions whose specificity is determined by few PDZ positions, and single-mutant position weight matrix averaging for all other ligand columns. The success of our method as the winning entry of the DREAM4 competition, as well as its superior performance over a general PDZ-ligand binding model, demonstrates the advantages of training a model on a well-selected domain-specific data set.
Collapse
Affiliation(s)
- Elena Zaslavsky
- Center for Translational Systems Biology and Department of Neurology, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Philip Bradley
- Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Chen Yanover
- Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- * E-mail:
| |
Collapse
|
26
|
Piipari M, Down TA, Hubbard TJ. Metamotifs--a generative model for building families of nucleotide position weight matrices. BMC Bioinformatics 2010; 11:348. [PMID: 20579334 PMCID: PMC2906491 DOI: 10.1186/1471-2105-11-348] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2010] [Accepted: 06/25/2010] [Indexed: 11/25/2022] Open
Abstract
Background Development of high-throughput methods for measuring DNA interactions of transcription factors together with computational advances in short motif inference algorithms is expanding our understanding of transcription factor binding site motifs. The consequential growth of sequence motif data sets makes it important to systematically group and categorise regulatory motifs. It has been shown that there are familial tendencies in DNA sequence motifs that are predictive of the family of factors that binds them. Further development of methods that detect and describe familial motif trends has the potential to help in measuring the similarity of novel computational motif predictions to previously known data and sensitively detecting regulatory motifs similar to previously known ones from novel sequence. Results We propose a probabilistic model for position weight matrix (PWM) sequence motif families. The model, which we call the 'metamotif' describes recurring familial patterns in a set of motifs. The metamotif framework models variation within a family of sequence motifs. It allows for simultaneous estimation of a series of independent metamotifs from input position weight matrix (PWM) motif data and does not assume that all input motif columns contribute to a familial pattern. We describe an algorithm for inferring metamotifs from weight matrix data. We then demonstrate the use of the model in two practical tasks: in the Bayesian NestedMICA model inference algorithm as a PWM prior to enhance motif inference sensitivity, and in a motif classification task where motifs are labelled according to their interacting DNA binding domain. Conclusions We show that metamotifs can be used as PWM priors in the NestedMICA motif inference algorithm to dramatically increase the sensitivity to infer motifs. Metamotifs were also successfully applied to a motif classification problem where sequence motif features were used to predict the family of protein DNA binding domains that would interact with it. The metamotif based classifier is shown to compare favourably to previous related methods. The metamotif has great potential for further use in machine learning tasks related to especially de novo computational sequence motif inference. The metamotif methods presented have been incorporated into the NestedMICA suite.
Collapse
Affiliation(s)
- Matias Piipari
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.
| | | | | |
Collapse
|
27
|
Gordân R, Narlikar L, Hartemink AJ. Finding regulatory DNA motifs using alignment-free evolutionary conservation information. Nucleic Acids Res 2010; 38:e90. [PMID: 20047961 PMCID: PMC2847231 DOI: 10.1093/nar/gkp1166] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2009] [Revised: 10/30/2009] [Accepted: 11/23/2009] [Indexed: 01/01/2023] Open
Abstract
As an increasing number of eukaryotic genomes are being sequenced, comparative studies aimed at detecting regulatory elements in intergenic sequences are becoming more prevalent. Most comparative methods for transcription factor (TF) binding site discovery make use of global or local alignments of orthologous regulatory regions to assess whether a particular DNA site is conserved across related organisms, and thus more likely to be functional. Since binding sites are usually short, sometimes degenerate, and often independent of orientation, alignment algorithms may not align them correctly. Here, we present a novel, alignment-free approach for using conservation information for TF binding site discovery. We relax the definition of conserved sites: we consider a DNA site within a regulatory region to be conserved in an orthologous sequence if it occurs anywhere in that sequence, irrespective of orientation. We use this definition to derive informative priors over DNA sequence positions, and incorporate these priors into a Gibbs sampling algorithm for motif discovery. Our approach is simple and fast. It requires neither sequence alignments nor the phylogenetic relationships between the orthologous sequences, yet it is more effective on real biological data than methods that do.
Collapse
Affiliation(s)
- Raluca Gordân
- Department of Computer Science, Duke University, Box 90129, Durham, NC 27708, USA
| | | | | |
Collapse
|
28
|
Abstract
Chromatin immunoprecipitation (ChIP) experiments allow the location of transcription factors to be determined across the genome. Subsequent analysis of the sequences of the identified regions allows binding to be localized at a higher resolution than can be achieved by current high-throughput experiments without sequence analysis and may provide important insight into the regulatory programs enacted by the protein of interest. In this chapter we review the tools, workflow, and common pitfalls of such analyses and recommend strategies for effective motif discovery from these data.
Collapse
|
29
|
Discovering multiple realistic TFBS motifs based on a generalized model. BMC Bioinformatics 2009; 10:321. [PMID: 19811641 PMCID: PMC2770069 DOI: 10.1186/1471-2105-10-321] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2009] [Accepted: 10/07/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of transcription factor binding sites (TFBSs) is a central problem in Bioinformatics on gene regulation. de novo motif discovery serves as a promising way to predict and better understand TFBSs for biological verifications. Real TFBSs of a motif may vary in their widths and their conservation degrees within a certain range. Deciding a single motif width by existing models may be biased and misleading. Additionally, multiple, possibly overlapping, candidate motifs are desired and necessary for biological verification in practice. However, current techniques either prohibit overlapping TFBSs or lack explicit control of different motifs. RESULTS We propose a new generalized model to tackle the motif widths by considering and evaluating a width range of interest simultaneously, which should better address the width uncertainty. Moreover, a meta-convergence framework for genetic algorithms (GAs), is proposed to provide multiple overlapping optimal motifs simultaneously in an effective and flexible way. Users can easily specify the difference amongst expected motif kinds via similarity test. Incorporating Genetic Algorithm with Local Filtering (GALF) for searching, the new GALF-G (G for generalized) algorithm is proposed based on the generalized model and meta-convergence framework. CONCLUSION GALF-G was tested extensively on over 970 synthetic, real and benchmark datasets, and is usually better than the state-of-the-art methods. The range model shows an increase in sensitivity compared with the single-width ones, while providing competitive precisions on the E. coli benchmark. Effectiveness can be maintained even using a very small population, exhibiting very competitive efficiency. In discovering multiple overlapping motifs in a real liver-specific dataset, GALF-G outperforms MEME by up to 73% in overall F-scores. GALF-G also helps to discover an additional motif which has probably not been annotated in the dataset. http://www.cse.cuhk.edu.hk/%7Etmchan/GALFG/
Collapse
|
30
|
Abstract
We present CisFinder software, which generates a comprehensive list of motifs enriched in a set of DNA sequences and describes them with position frequency matrices (PFMs). A new algorithm was designed to estimate PFMs directly from counts of n-mer words with and without gaps; then PFMs are extended over gaps and flanking regions and clustered to generate non-redundant sets of motifs. The algorithm successfully identified binding motifs for 12 transcription factors (TFs) in embryonic stem cells based on published chromatin immunoprecipitation sequencing data. Furthermore, CisFinder successfully identified alternative binding motifs of TFs (e.g. POU5F1, ESRRB, and CTCF) and motifs for known and unknown co-factors of genes associated with the pluripotent state of ES cells. CisFinder also showed robust performance in the identification of motifs that were only slightly enriched in a set of DNA sequences.
Collapse
Affiliation(s)
- Alexei A Sharov
- Developmental Genomics and Aging Section, Laboratory of Genetics, National Institute on Aging, NIH, Baltimore, MD 21224, USA
| | | |
Collapse
|
31
|
Nachman I, Regev A. BRNI: Modular analysis of transcriptional regulatory programs. BMC Bioinformatics 2009; 10:155. [PMID: 19457258 PMCID: PMC2694189 DOI: 10.1186/1471-2105-10-155] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Accepted: 05/20/2009] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Transcriptional responses often consist of regulatory modules - sets of genes with a shared expression pattern that are controlled by the same regulatory mechanisms. Previous methods allow dissecting regulatory modules from genomics data, such as expression profiles, protein-DNA binding, and promoter sequences. In cases where physical protein-DNA data are lacking, such methods are essential for the analysis of the underlying regulatory program. RESULTS Here, we present a novel approach for the analysis of modular regulatory programs. Our method - Biochemical Regulatory Network Inference (BRNI) - is based on an algorithm that learns from expression data a biochemically-motivated regulatory program. It describes the expression profiles of gene modules consisting of hundreds of genes using a small number of regulators and affinity parameters. We developed an ensemble learning algorithm that ensures the robustness of the learned model. We then use the topology of the learned regulatory program to guide the discovery of a library of cis-regulatory motifs, and determined the motif compositions associated with each module.We test our method on the cell cycle regulatory program of the fission yeast. We discovered 16 coherent modules, covering diverse processes from cell division to metabolism and associated them with 18 learned regulatory elements, including both known cell-cycle regulatory elements (MCB, Ace2, PCB, ACCCT box) and novel ones, some of which are associated with G2 modules. We integrate the regulatory relations from the expression- and motif-based models into a single network, highlighting specific topologies that result in distinct dynamics of gene expression in the fission yeast cell cycle. CONCLUSION Our approach provides a biologically-driven, principled way for deconstructing a set of genes into meaningful transcriptional modules and identifying their associated cis-regulatory programs. Our analysis sheds light on the architecture and function of the regulatory network controlling the fission yeast cell cycle, and a similar approach can be applied to the regulatory underpinnings of other modular transcriptional responses.
Collapse
Affiliation(s)
- Iftach Nachman
- FAS Center for System Biology, Harvard University, Cambridge, MA 02138, USA.
| | | |
Collapse
|