1
|
Lambourne L, Mattioli K, Santoso C, Sheynkman G, Inukai S, Kaundal B, Berenson A, Spirohn-Fitzgerald K, Bhattacharjee A, Rothman E, Shrestha S, Laval F, Carroll BS, Plassmeyer SP, Emenecker RJ, Yang Z, Bisht D, Sewell JA, Li G, Prasad A, Phanor S, Lane R, Moyer DC, Hunt T, Balcha D, Gebbia M, Twizere JC, Hao T, Holehouse AS, Frankish A, Riback JA, Salomonis N, Calderwood MA, Hill DE, Sahni N, Vidal M, Bulyk ML, Fuxman Bass JI. Widespread variation in molecular interactions and regulatory properties among transcription factor isoforms. Mol Cell 2025; 85:1445-1466.e13. [PMID: 40147441 PMCID: PMC12121496 DOI: 10.1016/j.molcel.2025.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 12/06/2024] [Accepted: 03/05/2025] [Indexed: 03/29/2025]
Abstract
Most human transcription factor (TF) genes encode multiple protein isoforms differing in DNA-binding domains, effector domains, or other protein regions. The global extent to which this results in functional differences between isoforms remains unknown. Here, we systematically compared 693 isoforms of 246 TF genes, assessing DNA binding, protein binding, transcriptional activation, subcellular localization, and condensate formation. Relative to reference isoforms, two-thirds of alternative TF isoforms exhibit differences in one or more molecular activities, which often could not be predicted from sequence. We observed two primary categories of alternative TF isoforms: "rewirers" and "negative regulators," both of which were associated with differentiation and cancer. Our results support a model wherein the relative expression levels of, and interactions involving, TF isoforms add an understudied layer of complexity to gene regulatory networks, demonstrating the importance of isoform-aware characterization of TF functions and providing a rich resource for further studies.
Collapse
Affiliation(s)
- Luke Lambourne
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Kaia Mattioli
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
| | - Clarissa Santoso
- Department of Biology, Boston University, Boston, MA 02215, USA; Bioinformatics Program, Boston University, Boston, MA 02215, USA
| | - Gloria Sheynkman
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Sachi Inukai
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Babita Kaundal
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Anna Berenson
- Molecular Biology, Cell Biology & Biochemistry Program, Boston University, Boston, MA 02215, USA
| | - Kerstin Spirohn-Fitzgerald
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Anukana Bhattacharjee
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA; Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Elisabeth Rothman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | | | - Florent Laval
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; TERRA Teaching and Research Centre, University of Liège, Gembloux 5030, Belgium; Laboratory of Viral Interactomes, GIGA Institute, University of Liège, Liège 4000, Belgium
| | - Brent S Carroll
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Stephen P Plassmeyer
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA; Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Ryan J Emenecker
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA; Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Zhipeng Yang
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Deepa Bisht
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jared A Sewell
- Department of Biology, Boston University, Boston, MA 02215, USA
| | - Guangyuan Li
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA; Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Anisa Prasad
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Harvard College, Cambridge, MA 02138, USA
| | - Sabrina Phanor
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Ryan Lane
- Department of Biology, Boston University, Boston, MA 02215, USA
| | - Devlin C Moyer
- Bioinformatics Program, Boston University, Boston, MA 02215, USA
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CD10 1SD, UK
| | - Dawit Balcha
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Marinella Gebbia
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada; Lunenfeld-Tanenbaum Research Institute (LTRI), Sinai Health System, Toronto, ON M5G 1X5, Canada
| | - Jean-Claude Twizere
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; TERRA Teaching and Research Centre, University of Liège, Gembloux 5030, Belgium; Laboratory of Viral Interactomes, GIGA Institute, University of Liège, Liège 4000, Belgium
| | - Tong Hao
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA; Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CD10 1SD, UK
| | - Josh A Riback
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Nathan Salomonis
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA; Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Michael A Calderwood
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - David E Hill
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Nidhi Sahni
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
| | - Martha L Bulyk
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
| | - Juan I Fuxman Bass
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biology, Boston University, Boston, MA 02215, USA; Bioinformatics Program, Boston University, Boston, MA 02215, USA; Molecular Biology, Cell Biology & Biochemistry Program, Boston University, Boston, MA 02215, USA.
| |
Collapse
|
2
|
Nayak N, Mehrotra S, Karamchandani AN, Santelia D, Mehrotra R. Recent advances in designing synthetic plant regulatory modules. FRONTIERS IN PLANT SCIENCE 2025; 16:1567659. [PMID: 40241826 PMCID: PMC11999978 DOI: 10.3389/fpls.2025.1567659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2025] [Accepted: 03/17/2025] [Indexed: 04/18/2025]
Abstract
Introducing novel functions in plants through synthetic multigene circuits requires strict transcriptional regulation. Currently, the use of natural regulatory modules in synthetic circuits is hindered by our limited knowledge of complex plant regulatory mechanisms, the paucity of characterized promoters, and the possibility of crosstalk with endogenous circuits. Synthetic regulatory modules can overcome these limitations. This article introduces an integrative de novo approach for designing plant synthetic promoters by utilizing the available online tools and databases. The recent achievements in designing and validating synthetic plant promoters, enhancers, transcription factors, and the challenges of establishing synthetic circuits in plants are also discussed.
Collapse
Affiliation(s)
- Namitha Nayak
- Department of Biological Sciences, Birla Institute of Technology and Sciences Pilani, Goa, India
| | - Sandhya Mehrotra
- Department of Biological Sciences, Birla Institute of Technology and Sciences Pilani, Goa, India
| | | | - Diana Santelia
- Institute of Integrative Biology, ETH Zürich Universitätstrasse, Zürich, Switzerland
| | - Rajesh Mehrotra
- Department of Biological Sciences, Birla Institute of Technology and Sciences Pilani, Goa, India
| |
Collapse
|
3
|
Urquiza-García U, Molina N, Halliday KJ, Millar AJ. Abundant clock proteins point to missing molecular regulation in the plant circadian clock. Mol Syst Biol 2025; 21:361-389. [PMID: 39979593 PMCID: PMC11965494 DOI: 10.1038/s44320-025-00086-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 12/20/2024] [Accepted: 01/03/2025] [Indexed: 02/22/2025] Open
Abstract
Understanding the biochemistry behind whole-organism traits such as flowering time is a longstanding challenge, where mathematical models are critical. Very few models of plant gene circuits use the absolute units required for comparison to biochemical data. We refactor two detailed models of the plant circadian clock from relative to absolute units. Using absolute RNA quantification, a simple model predicted abundant clock protein levels in Arabidopsis thaliana, up to 100,000 proteins per cell. NanoLUC reporter protein fusions validated the predicted levels of clock proteins in vivo. Recalibrating the detailed models to these protein levels estimated their DNA-binding dissociation constants (Kd). We estimate the same Kd from multiple results in vitro, extending the method to any promoter sequence. The detailed models simulated the Kd range estimated from LUX DNA-binding in vitro but departed from the data for CCA1 binding, pointing to further circadian mechanisms. Our analytical and experimental methods should transfer to understand other plant gene regulatory networks, potentially including the natural sequence variation that contributes to evolutionary adaptation.
Collapse
Affiliation(s)
- Uriel Urquiza-García
- Centre for Engineering Biology and School of Biological Sciences, C. H. Waddington Building, University of Edinburgh, King's Buildings, Edinburgh, EH9 3BF, UK
- Institute of Synthetic Biology, University of Düsseldorf, Düsseldorf, Germany
- CEPLAS-Cluster of Excellence on Plant Sciences, Düsseldorf, Germany
| | - Nacho Molina
- Centre for Engineering Biology and School of Biological Sciences, C. H. Waddington Building, University of Edinburgh, King's Buildings, Edinburgh, EH9 3BF, UK
- Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC) CNRS UMR 7104, INSERM U964, Université de Strasbourg, 1 Rue Laurent Fries, 67404, Illkirch, France
| | - Karen J Halliday
- School of Biological Sciences, Daniel Rutherford Building, University of Edinburgh, King's Buildings, Edinburgh, EH9 3BF, UK
| | - Andrew J Millar
- Centre for Engineering Biology and School of Biological Sciences, C. H. Waddington Building, University of Edinburgh, King's Buildings, Edinburgh, EH9 3BF, UK.
| |
Collapse
|
4
|
Jurgens JA, Matos Ruiz PM, King J, Foster EE, Berube L, Chan WM, Barry BJ, Jeong R, Rothman E, Whitman MC, MacKinnon S, Rivera-Quiles C, Pratt BM, Easterbrooks T, Mensching FM, Di Gioia SA, Pais L, England EM, de Berardinis T, Magli A, Koc F, Asakawa K, Kawakami K, O'Donnell-Luria A, Hunter DG, Robson CD, Bulyk ML, Engle EC. Gene Identification for Ocular Congenital Cranial Motor Neuron Disorders Using Human Sequencing, Zebrafish Screening, and Protein Binding Microarrays. Invest Ophthalmol Vis Sci 2025; 66:62. [PMID: 40162949 PMCID: PMC11956743 DOI: 10.1167/iovs.66.3.62] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 02/24/2025] [Indexed: 04/02/2025] Open
Abstract
Purpose To functionally evaluate novel human sequence-derived candidate genes and variants for unsolved ocular congenital cranial dysinnervation disorders (oCCDDs). Methods Through exome and genome sequencing of a genetically unsolved human oCCDD cohort, we previously reported the identification of variants in many candidate genes. Here, we describe a parallel study that prioritized a subset of these genes (43 human genes, 57 zebrafish genes) using a G0 CRISPR/Cas9-based knockout assay in zebrafish and generated F2 germline mutants for 17. We tested the functionality of variants of uncertain significance in known and novel candidate transcription factor-encoding genes through protein binding microarrays. Results We first demonstrated the feasibility of the G0 screen by targeting known oCCDD genes phox2a and mafba. Approximately 70% to 90% of gene-targeted G0 zebrafish embryos recapitulated germline homozygous null-equivalent phenotypes. Using this approach, we then identified three novel candidate oCCDD genes (SEMA3F, OLIG2, and FRMD4B) with putative contributions to human and zebrafish cranial motor development. In addition, protein binding microarrays demonstrated reduced or abolished DNA binding of human variants of uncertain significance in known and novel sequence-derived transcription factors PHOX2A (p.(Trp137Cys)), MAFB (p.(Glu223Lys)), and OLIG2 (p.(Arg156Leu)). Conclusions This study nominates three strong novel candidate oCCDD genes (SEMA3F, OLIG2, and FRMD4B) and supports the functionality and putative pathogenicity of transcription factor candidate variants PHOX2A p.(Trp137Cys), MAFB p.(Glu223Lys), and OLIG2 p.(Arg156Leu). Our findings support that G0 loss-of-function screening in zebrafish can be coupled with human sequence analysis and protein binding microarrays to aid in prioritizing oCCDD candidate genes/variants.
Collapse
Affiliation(s)
- Julie A. Jurgens
- F.M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Neurology, Harvard Medical School, Boston, Massachusetts, United States
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States
| | - Paola M. Matos Ruiz
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts, United States
| | - Jessica King
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States
| | - Emma E. Foster
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts, United States
| | - Lindsay Berube
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts, United States
| | - Wai-Man Chan
- F.M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Neurology, Harvard Medical School, Boston, Massachusetts, United States
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States
- Howard Hughes Medical Institute, Chevy Chase, Maryland, United States
| | - Brenda J. Barry
- F.M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts, United States
- Howard Hughes Medical Institute, Chevy Chase, Maryland, United States
| | - Raehoon Jeong
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, Massachusetts, United States
| | - Elisabeth Rothman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States
| | - Mary C. Whitman
- F.M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Ophthalmology, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, United States
| | - Sarah MacKinnon
- Department of Ophthalmology, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, United States
| | - Cristina Rivera-Quiles
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts, United States
| | - Brandon M. Pratt
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts, United States
| | - Teresa Easterbrooks
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts, United States
| | - Fiona M. Mensching
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts, United States
| | - Silvio Alessandro Di Gioia
- F.M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Neurology, Harvard Medical School, Boston, Massachusetts, United States
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States
- Regeneron Pharmaceuticals, Tarrytown, New York, United States
| | - Lynn Pais
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, United States
| | - Eleina M. England
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, United States
| | - Teresa de Berardinis
- Department of Ophthalmologic Sciences, Faculty of Medicine and Surgery, University “Federico II”, Naples, Italy
| | - Adriano Magli
- Department of Ophthalmologic Sciences, Faculty of Medicine and Surgery, University “Federico II”, Naples, Italy
| | - Feray Koc
- Department of Ophthalmology, Faculty of Medicine, Izmir Katip Celebi University, Izmır, Turkey
| | - Kazuhide Asakawa
- Neurobiology and Pathology Laboratory, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Koichi Kawakami
- Laboratory of Molecular and Developmental Biology, National Institute of Genetics; Department of Genetics, Graduate University for Advanced Studies (SOKENDAI), Mishima, Shizuoka, Japan
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, United States
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States
| | - David G. Hunter
- Department of Ophthalmology, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, United States
| | - Caroline D. Robson
- Division of Neuroradiology, Department of Radiology, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Radiology, Harvard Medical School, Boston, Massachusetts, United States
| | - Martha L. Bulyk
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, Massachusetts, United States
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States
| | - Elizabeth C. Engle
- F.M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Neurology, Harvard Medical School, Boston, Massachusetts, United States
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States
- Howard Hughes Medical Institute, Chevy Chase, Maryland, United States
- Department of Ophthalmology, Boston Children's Hospital, Boston, Massachusetts, United States
- Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, United States
| |
Collapse
|
5
|
Mekkaoui F, Drewell RA, Dresch JM, Spratt DE. Experimental approaches to investigate biophysical interactions between homeodomain transcription factors and DNA. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2025; 1868:195074. [PMID: 39644990 PMCID: PMC11832328 DOI: 10.1016/j.bbagrm.2024.195074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 11/26/2024] [Accepted: 12/01/2024] [Indexed: 12/09/2024]
Abstract
Homeodomain transcription factors (TFs) bind to specific DNA sequences to regulate the expression of target genes. Structural work has provided insight into molecular identities and aided in unraveling structural features of these TFs. However, the detailed affinity and specificity by which these TFs bind to DNA sequences is still largely unknown. Qualitative methods, such as DNA footprinting, Electrophoretic Mobility Shift Assays (EMSAs), Systematic Evolution of Ligands by Exponential Enrichment (SELEX), Bacterial One Hybrid (B1H) systems, Surface Plasmon Resonance (SPR), and Protein Binding Microarrays (PBMs) have been widely used to investigate the biochemical characteristics of TF-DNA binding events. In addition to these qualitative methods, bioinformatic approaches have also assisted in TF binding site discovery. Here we discuss the advantages and limitations of these different approaches, as well as the benefits of utilizing more quantitative approaches, such as Mechanically Induced Trapping of Molecular Interactions (MITOMI), Microscale Thermophoresis (MST) and Isothermal Titration Calorimetry (ITC), in determining the biophysical basis of binding specificity of TF-DNA complexes and improving upon existing computational approaches aimed at affinity predictions.
Collapse
Affiliation(s)
- Fadwa Mekkaoui
- Gustaf H. Carlson School of Chemistry and Biochemistry, Clark University, 950 Main Street, Worcester, MA 01610, United States of America
| | - Robert A Drewell
- Biology Department, Clark University, 950 Main Street, Worcester, MA 01610, United States of America
| | - Jacqueline M Dresch
- Biology Department, Clark University, 950 Main Street, Worcester, MA 01610, United States of America
| | - Donald E Spratt
- Gustaf H. Carlson School of Chemistry and Biochemistry, Clark University, 950 Main Street, Worcester, MA 01610, United States of America.
| |
Collapse
|
6
|
Wan B, Yu J. Protein target search diffusion-association/dissociation free energy landscape around DNA binding site with flanking sequences. Biophys J 2025; 124:677-692. [PMID: 39818622 PMCID: PMC11900189 DOI: 10.1016/j.bpj.2025.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Revised: 12/05/2024] [Accepted: 01/13/2025] [Indexed: 01/18/2025] Open
Abstract
In this work we present a minimal structure-based model of protein diffusional search along local DNA amid protein binding and unbinding events on the DNA, taking into account protein-DNA electrostatic interactions and hydrogen-bonding (HB) interactions or contacts at the interface. We accordingly constructed the protein diffusion-association/dissociation free energy surface and mapped it to 1D as the protein slides along DNA, maintaining the protein-DNA interfacial HB contacts that presumably dictate the DNA sequence information detection. Upon DNA helical path correction, the protein 1D diffusion rates along local DNA can be physically derived to be consistent with experimental measurements. We also show that the sequence-dependent protein sliding or stepping patterns along DNA are regulated by collective interfacial HB dynamics, which also determines the ruggedness of the protein diffusion free energy landscape on the local DNA. In comparison, protein association or binding with DNA are generically dictated by the protein-DNA electrostatic interactions, with an interaction zone of nanometers around DNA. Extra degrees of freedom (DOFs) of the protein such as rotations and conformational fluctuations can be well accommodated within the protein-DNA electrostatic interaction zone. As such we demonstrate that the protein binding or association free energy profiling along DNA smoothens over the 1D diffusion free energy landscape, which leads to population variations for an order of magnitude upon a marginal free energetic smoothening around the specific or consensus sites. We further show that the protein unbinding or dissociation from a comparatively high-binding affinity DNA site is dominated by lateral diffusion to the flanking low-affinity sites. The results predict that experimental characterizations on the relative protein-DNA binding affinities or population profiling on the DNA are systematically and physically impacted by the extra DOFs of protein motions aside from 1D translation or helical tracking, as well as from flanking DNA sequences due to protein 1D diffusion and nonspecific binding/unbinding.
Collapse
Affiliation(s)
- Biao Wan
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Jin Yu
- Department of Physics and Astronomy, Department of Chemistry, NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, California.
| |
Collapse
|
7
|
Guharajan S, Parisutham V, Brewster RC. A systematic survey of TF function in E. coli suggests RNAP stabilization is a prevalent strategy for both repressors and activators. Nucleic Acids Res 2025; 53:gkaf058. [PMID: 39921566 PMCID: PMC11806353 DOI: 10.1093/nar/gkaf058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 01/08/2025] [Accepted: 01/24/2025] [Indexed: 02/10/2025] Open
Abstract
Transcription factors (TFs) are often classified as activators or repressors, yet these context-dependent labels are inadequate to predict quantitative profiles that emerge across different promoters. A mechanistic understanding of how different regulatory sequences shape TF function is challenging due to the lack of systematic genetic control in endogenous genes. To address this, we use a library of Escherichia coli strains with precise control of TF copy number, measuring the quantitative regulatory input-output function of 90 TFs on synthetic promoters that isolate the contributions of TF binding sequence, location, and basal promoter strength to gene expression. We interpret the measured regulation of these TFs using a thermodynamic model of gene expression and uncover stabilization of RNA polymerase as a pervasive regulatory mechanism, common to both activating and repressing TFs. This property suggests ways to tune the dynamic range of gene expression through the interplay of stabilizing TF function and RNA polymerase basal occupancy, a phenomenon we confirm by measuring fold change for stabilizing TFs across synthetic promoter sequences spanning over 100-fold basal expression. Our work deconstructs TF function at a mechanistic level, providing foundational principles on how gene expression is realized across different promoter contexts, with implications for decoding the relationship between sequence and gene expression.
Collapse
Affiliation(s)
- Sunil Guharajan
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, United States
- Division of Gastroenterology, Hepatology and Nutrition, Boston Children's Hospital, Boston, MA 02115, United States
| | - Vinuselvi Parisutham
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, United States
| | - Robert C Brewster
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, United States
- Department of Microbiology and Physiological Systems, University of Massachusetts Chan Medical School, Worcester, MA 01605, United States
| |
Collapse
|
8
|
Rimal P, Paul SK, Panday SK, Alexov E. Further Development of SAMPDI-3D: A Machine Learning Method for Predicting Binding Free Energy Changes Caused by Mutations in Either Protein or DNA. Genes (Basel) 2025; 16:101. [PMID: 39858648 PMCID: PMC11764785 DOI: 10.3390/genes16010101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2024] [Revised: 01/15/2025] [Accepted: 01/16/2025] [Indexed: 01/27/2025] Open
Abstract
BACKGROUND/OBJECTIVES Predicting the effects of protein and DNA mutations on the binding free energy of protein-DNA complexes is crucial for understanding how DNA variants impact wild-type cellular function. As many cellular interactions involve protein-DNA binding, accurately predicting changes in binding free energy (ΔΔG) is valuable for distinguishing pathogenic mutations from benign ones. METHODS This study describes the development and optimization of the SAMPDI-3Dv2 machine learning method, which is trained on an expanded database of experimentally measured ΔΔGs. This enhanced model incorporates new features, including the 3D structure of the mutant protein, features of the mutant structure, and a position-specific scoring matrix (PSSM). Benchmarking was conducted using 5-fold cross-validation. RESULTS The updated SAMPDI-3D model (SAMPDI-3Dv2) achieved Pearson correlation coefficients (PCCs) of 0.68 for protein and 0.80 for DNA mutations. These results represent significant improvements over existing tools. Additionally, the method's rapid execution time enables genome-scale predictions. CONCLUSIONS The improved SAMPDI-3Dv2 shows enhanced predictive performance for analyzing mutations in protein-DNA complexes. By leveraging structural information and an expanded training dataset, SAMPDI-3Dv2 provides researchers with a more accurate and efficient tool for mutation analysis, contributing to identifying pathogenic variants and improving our understanding of cellular function.
Collapse
Affiliation(s)
| | | | | | - Emil Alexov
- Department of Physics and Astronomy, College of Science, Clemson University, Clemson, SC 29634, USA; (P.R.); (S.K.P.); (S.K.P.)
| |
Collapse
|
9
|
Snyder LF, O’Brien EM, Zhao J, Liang J, Bruce BJ, Zhang Y, Zhu W, Cassier TJ, Schnicker NJ, Zhou X, Gordân R, He BZ. Divergence in a Eukaryotic Transcription Factor's co-TF Dependence Involves Multiple Intrinsically Disordered Regions Affecting Activation and Autoinhibition. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.04.20.590343. [PMID: 39253425 PMCID: PMC11383300 DOI: 10.1101/2024.04.20.590343] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Combinatorial control by multiple transcription factors (TFs) is a hallmark of eukaryotic gene regulation. Despite its prevalence and crucial roles in enhancing specificity and integrating information, the mechanisms behind why eukaryotic TFs depend on one another, and whether such interdependence evolves, are not well understood. We exploit natural variation in co-TF dependence in the yeast phosphate starvation (PHO) response to address this question. In the model yeast Saccharomyces cerevisiae, the main TF, Pho4, relies on the co-TF Pho2 to regulate ~28 genes. In a related yeast pathogen, Candida glabrata, its Pho4 exhibits significantly reduced Pho2 dependence and has an expanded target set of ~70 genes. Biochemical analyses showed C. glabrata Pho4 (CgPho4) binds to the same consensus motif with 3-4-fold higher affinity than ScPho4 does. A machine-learning-based prediction and yeast one-hybrid assay identified two Intrinsically Disordered Regions (IDRs) in CgPho4 that boost the activity of the main activation domain but showed little to no activity on their own. We also found evidence for autoinhibition behind the co-TF dependence in ScPho4. An IDR in ScPho4 next to its DNA binding domain was found to act as a double-edged sword: it both allows for enhanced activity with Pho2, and inhibits Pho4's activity without Pho2. This study provides a detailed molecular picture of how co-TF dependence is mediated and how its evolution, mainly driven by IDR divergence, can lead to significant rewiring of the regulatory network.
Collapse
Affiliation(s)
- Lindsey F. Snyder
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA
| | | | - Jia Zhao
- Department of Biology, University of Iowa, Iowa City, IA
| | - Jinye Liang
- Department of Biology, University of Iowa, Iowa City, IA
| | - Baylee J. Bruce
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA
| | - Yuning Zhang
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC
| | - Wei Zhu
- Department of Molecular Genetics & Microbiology, Duke University, Durham, NC
| | | | - Nicholas J. Schnicker
- Protein and Crystallography Facility, University of Iowa, Iowa City, IA
- Department of Molecular Physiology and Biophysics, University of Iowa, Iowa City, IA
| | - Xu Zhou
- Department of Pediatrics, Division of Gastroenterology, Hepatology and Nutrition, Boston Children’s Hospital and Harvard Medical School, Boston, MA
| | - Raluca Gordân
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC
- Department of Molecular Genetics & Microbiology, Duke University, Durham, NC
- Department of Computer Science, Duke University, Durham, NC
- Department of Cell Biology, Duke University, Durham, NC
| | - Bin Z. He
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA
- Department of Biology, University of Iowa, Iowa City, IA
| |
Collapse
|
10
|
Rivarola-Sena AC, Vialette AC, Andres-Robin A, Chambrier P, Bideau L, Franco-Zorrilla JM, Scutt CP. Evolution of the basic helix-loop-helix transcription factor SPATULA and its role in gynoecium development. ANNALS OF BOTANY 2024; 134:1037-1054. [PMID: 39183603 PMCID: PMC11687623 DOI: 10.1093/aob/mcae140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 08/19/2024] [Indexed: 08/27/2024]
Abstract
BACKGROUND AND AIMS SPATULA (SPT) encodes a basic helix-loop-helix transcription factor in Arabidopsis thaliana that functions in the development of the style, stigma and replum tissues, all of which arise from the carpel margin meristem of the gynoecium. Here we use a comparative approach to investigate the evolutionary history of SPT and identify changes that potentially contributed to its role in gynoecium development. METHODS We investigate SPT's molecular and functional evolution using phylogenetic reconstruction, yeast two-hybrid analyses of protein-protein interactions, microarray-based analyses of protein-DNA interactions, plant transformation assays, RNA in situ hybridization, and in silico analyses of promoter sequences. KEY RESULTS We demonstrate the SPT lineage to have arisen at the base of euphyllophytes from a clade of potentially light-regulated transcription factors through gene duplication followed by the loss of an active phytochrome binding (APB) domain. We also clarify the more recent evolutionary history of SPT and its paralogue ALCATRAZ (ALC), which appear to have arisen through a large-scale duplication within Brassicales. We find that SPT orthologues from diverse groups of seed plants share strikingly similar capacities for protein-protein and protein-DNA interactions, and that SPT coding regions from a wide taxonomic range of plants are able to complement loss-of-function spt mutations in transgenic Arabidopsis. However, the expression pattern of SPT appears to have evolved significantly within angiosperms, and we identify structural changes in SPT's promoter region that correlate with the acquisition of high expression levels in tissues arising from the carpel margin meristem in Brassicaceae. CONCLUSIONS We conclude that changes in SPT's expression pattern made a major contribution to the evolution of its developmental role in the gynoecium of Brassicaceae. By contrast, the main biochemical capacities of SPT, as well as many of its immediate transcriptional targets, appear to have been conserved at least since the base of living angiosperms.
Collapse
Affiliation(s)
- Ana C Rivarola-Sena
- Laboratoire Reproduction et Développement des Plantes (CNRS UMR 5667), Ecole Normale Supérieure de Lyon, 69364 Lyon Cedex 7, France
| | - Aurélie C Vialette
- Laboratoire Reproduction et Développement des Plantes (CNRS UMR 5667), Ecole Normale Supérieure de Lyon, 69364 Lyon Cedex 7, France
| | - Amélie Andres-Robin
- Laboratoire Reproduction et Développement des Plantes (CNRS UMR 5667), Ecole Normale Supérieure de Lyon, 69364 Lyon Cedex 7, France
| | - Pierre Chambrier
- Laboratoire Reproduction et Développement des Plantes (CNRS UMR 5667), Ecole Normale Supérieure de Lyon, 69364 Lyon Cedex 7, France
| | - Loïc Bideau
- Laboratoire Reproduction et Développement des Plantes (CNRS UMR 5667), Ecole Normale Supérieure de Lyon, 69364 Lyon Cedex 7, France
| | - Jose M Franco-Zorrilla
- Centro Nacional de Biotecnología-Consejo Superior de Investigaciones Científicas, C/Darwin3, 28049 Madrid, Spain
| | - Charles P Scutt
- Laboratoire Reproduction et Développement des Plantes (CNRS UMR 5667), Ecole Normale Supérieure de Lyon, 69364 Lyon Cedex 7, France
| |
Collapse
|
11
|
Lawton ML, Inge MM, Blum BC, Smith-Mahoney EL, Bolzan D, Lin W, McConney C, Porter J, Moore J, Youssef A, Tharani Y, Varelas X, Denis GV, Wong WW, Padhorny D, Kozakov D, Siggers T, Wuchty S, Snyder-Cappione J, Emili A. Multiomic profiling of chronically activated CD4+ T cells identifies drivers of exhaustion and metabolic reprogramming. PLoS Biol 2024; 22:e3002943. [PMID: 39689157 DOI: 10.1371/journal.pbio.3002943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 01/06/2025] [Accepted: 11/15/2024] [Indexed: 12/19/2024] Open
Abstract
Repeated antigen exposure leads to T-cell exhaustion, a transcriptionally and epigenetically distinct cellular state marked by loss of effector functions (e.g., cytotoxicity, cytokine production/release), up-regulation of inhibitory receptors (e.g., PD-1), and reduced proliferative capacity. Molecular pathways underlying T-cell exhaustion have been defined for CD8+ cytotoxic T cells, but which factors drive exhaustion in CD4+ T cells, that are also required for an effective immune response against a tumor or infection, remains unclear. Here, we utilize quantitative proteomic, phosphoproteomic, and metabolomic analyses to characterize the molecular basis of the dysfunctional cell state induced by chronic stimulation of CD4+ memory T cells. We identified a dynamic response encompassing both known and novel up-regulated cell surface receptors, as well as dozens of unexpected transcriptional regulators. Integrated causal network analysis of our combined data predicts the histone acetyltransferase p300 as a driver of aspects of this phenotype following chronic stimulation, which we confirmed via targeted small molecule inhibition. While our integrative analysis also revealed large-scale metabolic reprogramming, our independent investigation confirmed a global remodeling away from glycolysis to a dysfunctional fatty acid oxidation-based metabolism coincident with oxidative stress. Overall, these data provide both insights into the mechanistic basis of CD4+ T-cell exhaustion and serve as a valuable resource for future interventional studies aimed at modulating T-cell dysfunction.
Collapse
Affiliation(s)
- Matthew L Lawton
- Center for Network Systems Biology, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Melissa M Inge
- Depart of Biology, Boston University, Boston, Massachusetts, United States of America
| | - Benjamin C Blum
- Center for Network Systems Biology, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Erika L Smith-Mahoney
- Department of Microbiology, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Dante Bolzan
- Department of Computer Science, University of Miami, Miami, Florida, United States of America
| | - Weiwei Lin
- Center for Network Systems Biology, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Christina McConney
- Department of Microbiology, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Jacob Porter
- Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon, United States of America
| | - Jarrod Moore
- Center for Network Systems Biology, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Ahmed Youssef
- Center for Network Systems Biology, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Yashasvi Tharani
- Center for Network Systems Biology, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Depart of Biology, Boston University, Boston, Massachusetts, United States of America
| | - Xaralabos Varelas
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Gerald V Denis
- Hematology and Medical Oncology, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Wilson W Wong
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Dzmitry Padhorny
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, United States of America
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, United States of America
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, United States of America
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, United States of America
| | - Trevor Siggers
- Depart of Biology, Boston University, Boston, Massachusetts, United States of America
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, Miami, Florida, United States of America
- Miami Institute of Data Science and Computing, Miami, Florida, United States of America
| | - Jennifer Snyder-Cappione
- Department of Microbiology, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Andrew Emili
- Center for Network Systems Biology, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Depart of Biology, Boston University, Boston, Massachusetts, United States of America
- Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon, United States of America
| |
Collapse
|
12
|
Vorontsov IE, Kozin I, Abramov S, Boytsov A, Jolma A, Albu M, Ambrosini G, Faltejskova K, Gralak AJ, Gryzunov N, Inukai S, Kolmykov S, Kravchenko P, Kribelbauer-Swietek JF, Laverty KU, Nozdrin V, Patel ZM, Penzar D, Plescher ML, Pour SE, Razavi R, Yang AWH, Yevshin I, Zinkevich A, Weirauch MT, Bucher P, Deplancke B, Fornes O, Grau J, Grosse I, Kolpakov FA, Makeev VJ, Hughes TR, Kulakovskiy IV. Cross-platform DNA motif discovery and benchmarking to explore binding specificities of poorly studied human transcription factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.11.619379. [PMID: 39605530 PMCID: PMC11601219 DOI: 10.1101/2024.11.11.619379] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
A DNA sequence pattern, or "motif", is an essential representation of DNA-binding specificity of a transcription factor (TF). Any particular motif model has potential flaws due to shortcomings of the underlying experimental data and computational motif discovery algorithm. As a part of the Codebook/GRECO-BIT initiative, here we evaluated at large scale the cross-platform recognition performance of positional weight matrices (PWMs), which remain popular motif models in many practical applications. We applied ten different DNA motif discovery tools to generate PWMs from the "Codebook" data comprised of 4,237 experiments from five different platforms profiling the DNA-binding specificity of 394 human proteins, focusing on understudied transcription factors of different structural families. For many of the proteins, there was no prior knowledge of a genuine motif. By benchmarking-supported human curation, we constructed an approved subset of experiments comprising about 30% of all experiments and 50% of tested TFs which displayed consistent motifs across platforms and replicates. We present the Codebook Motif Explorer (https://mex.autosome.org), a detailed online catalog of DNA motifs, including the top-ranked PWMs, and the underlying source and benchmarking data. We demonstrate that in the case of high-quality experimental data, most of the popular motif discovery tools detect valid motifs and generate PWMs, which perform well both on genomic and synthetic data. Yet, for each of the algorithms, there were problematic combinations of proteins and platforms, and the basic motif properties such as nucleotide composition and information content offered little help in detecting such pitfalls. By combining multiple PMWs in decision trees, we demonstrate how our setup can be readily adapted to train and test binding specificity models more complex than PWMs. Overall, our study provides a rich motif catalog as a solid baseline for advanced models and highlights the power of the multi-platform multi-tool approach for reliable mapping of DNA binding specificities.
Collapse
Affiliation(s)
- Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Life Improvement by Future Technologies (LIFT) Center, 121205, Moscow, Russia
| | - Ivan Kozin
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121, Seattle, WA, USA
| | - Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121, Seattle, WA, USA
| | - Arttu Jolma
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Mihai Albu
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | | | - Katerina Faltejskova
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 160 00 Praha 6, Czech Republic
- Computer Science Institute, Faculty of Mathematics and Physics, Charles University, 118 00 Praha 1, Czech Republic
| | - Antoni J Gralak
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Nikita Gryzunov
- Life Improvement by Future Technologies (LIFT) Center, 121205, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Sachi Inukai
- Chugai Pharmaceutical Co., Ltd, Tokyo, 103-8324, Japan
| | - Semyon Kolmykov
- Department of Computational Biology, Sirius University of Science and Technology, 354340, Sirius, Krasnodar region, Russia
| | | | - Judith F Kribelbauer-Swietek
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Kaitlin U Laverty
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Vladimir Nozdrin
- Life Improvement by Future Technologies (LIFT) Center, 121205, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Zain M Patel
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Dmitry Penzar
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
| | - Marie-Luise Plescher
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, 06099, Halle, Germany
| | - Sara E Pour
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Rozita Razavi
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Ally W H Yang
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | | | - Arsenii Zinkevich
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | | | - Philipp Bucher
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Oriol Fornes
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, 06099, Halle, Germany
| | - Ivo Grosse
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, 06099, Halle, Germany
| | - Fedor A Kolpakov
- Department of Computational Biology, Sirius University of Science and Technology, 354340, Sirius, Krasnodar region, Russia
- Bioinformatics Laboratory, Federal Research Center for Information and Computational Technologies, 630090, Novosibirsk, Russia
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Moscow Center for Advanced Studies, 123592, Moscow, Russia
| | - Timothy R Hughes
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Life Improvement by Future Technologies (LIFT) Center, 121205, Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Russia
| |
Collapse
|
13
|
King DE, Beard EE, Satusky MJ, Ryde IT, George A, Johnson C, Dolan EL, Zhang Y, Zhu W, Wilkins H, Corden E, Murphy SK, Erie D, Gordan R, Meyer JN. TFAM as a sensor of UVC-induced mitochondrial DNA damage. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.24.620005. [PMID: 39484377 PMCID: PMC11527015 DOI: 10.1101/2024.10.24.620005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Mitochondria lack nucleotide excision DNA repair; however, mitochondrial DNA (mtDNA) is resistant to mutation accumulation following DNA damage. These observations suggest additional damage sensing or protection mechanisms. Transcription Factor A, Mitochondrial (TFAM) compacts mtDNA into nucleoids. As such, TFAM has emerged as a candidate for protecting DNA or sensing damage. To examine these possibilities, we used live-cell imaging, cell-based assays, atomic force microscopy, and high-throughput protein-DNA binding assays to characterize the binding properties of TFAM to UVC-irradiated DNA and cellular consequences of UVC irradiation. Our data indicate an increase in mtDNA degradation and turnover, without a loss in mitochondrial membrane potential that might trigger mitophagy. We identified a reduction in sequence specificity of TFAM associated with UVC irradiation and a redistribution of TFAM binding throughout the mitochondrial genome. Our AFM data show increased compaction of DNA by TFAM in the presence of damage. Despite the TFAM-mediated compaction of mtDNA, we do not observe any protective effect on DNA damage accumulation in cells or in vitro. Taken together, these studies indicate that UVC-induced DNA damage promotes compaction by TFAM, suggesting that TFAM may act as a damage sensor, sequestering damaged genomes to prevent mutagenesis by direct removal or suppression of replication.
Collapse
|
14
|
Wang X, Zhang J, Su J, Huang T, Lian L, Nie Q, Zhang X, Li J, Wang Y. Genome-wide mapping of the binding sites of myocyte enhancer factor 2A in chicken primary myoblasts. Poult Sci 2024; 103:104097. [PMID: 39094502 PMCID: PMC11345569 DOI: 10.1016/j.psj.2024.104097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 07/04/2024] [Accepted: 07/09/2024] [Indexed: 08/04/2024] Open
Abstract
Myocyte enhancer factor 2A (MEF2A) is a transcription factor that plays a critical role in cell proliferation, differentiation and apoptosis. In contrast to the wide characterization of its regulation mechanism in mammalian skeletal muscle, its role in chickens is limited. Especially, its wide target genes remain to be identified. Therefore, we utilized Cleavage Under Targets and Tagmentation (CUT&Tag) technology to reveal the genome-wide binding profile of MEF2A in chicken primary myoblasts thus gaining insights into its potential role in muscle development. Our results revealed that MEF2A binding sites were primarily distributed in intergenic and intronic regions. Within the promoter region, although only 8.87% of MEF2A binding sites were found, these binding sites were concentrated around the transcription start site (TSS). Following peak annotation, a total of 1903 genes were identified as potential targets of MEF2A. Gene Ontology (GO) enrichment analysis further revealed that MEF2A target genes may be involved in the regulation of embryonic development in multiple organ systems, including muscle development, gland development, and visual system development. Moreover, a comparison of the MEF2A target genes identified in chicken primary myoblasts with those in mouse C2C12 cells revealed 388 target genes are conserved across species, 1515 target genes are chicken specific. Among these conserved genes, ankyrin repeat and SOCS box containing 5 (ASB5), transmembrane protein 182 (TMEM182), myomesin 2 (MYOM2), leucyl and cystinyl aminopeptidase (LNPEP), actinin alpha 2 (ACTN2), sorbin and SH3 domain containing 1 (SORBS1), ankyrin 3 (ANK3), sarcoglycan delta (SGCD), and ORAI calcium release-activated calcium modulator 1 (ORAI1) exhibited consistent expression patterns with MEF2A during embryonic muscle development. Finally, TMEM182, as an important negative regulator of muscle development, has been validated to be regulated by MEF2A by dual-luciferase and quantitative real-time PCR (qPCR) assays. In summary, our study for the first time provides a wide landscape of MEF2A target genes in chicken primary myoblasts, which supports the active role of MEF2A in chicken muscle development.
Collapse
Affiliation(s)
- Xinglong Wang
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China
| | - Jiannan Zhang
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China
| | - Jiancheng Su
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China
| | - Tianjiao Huang
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China
| | - Ling Lian
- National Engineering Laboratory for Animal Breeding and MOA Key Laboratory of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, PR China
| | - Qinghua Nie
- Lingnan Guangdong Laboratory of Agriculture, South China Agricultural University, Guangzhou, PR China
| | - Xin Zhang
- Joint Nutrition Center for Animal Feeding of Sichuan University-Shengliyuan Group
| | - Juan Li
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China; Joint Nutrition Center for Animal Feeding of Sichuan University-Shengliyuan Group
| | - Yajun Wang
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China; Joint Nutrition Center for Animal Feeding of Sichuan University-Shengliyuan Group.
| |
Collapse
|
15
|
Bonnell V, Zhang Y, Brown A, Horton J, Josling G, Chiu TP, Rohs R, Mahony S, Gordân R, Llinás M. DNA sequence and chromatin differentiate sequence-specific transcription factor binding in the human malaria parasite Plasmodium falciparum. Nucleic Acids Res 2024; 52:10161-10179. [PMID: 38966997 PMCID: PMC11417369 DOI: 10.1093/nar/gkae585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 05/30/2024] [Accepted: 06/27/2024] [Indexed: 07/06/2024] Open
Abstract
Development of the malaria parasite, Plasmodium falciparum, is regulated by a limited number of sequence-specific transcription factors (TFs). However, the mechanisms by which these TFs recognize genome-wide binding sites is largely unknown. To address TF specificity, we investigated the binding of two TF subsets that either bind CACACA or GTGCAC DNA sequence motifs and further characterized two additional ApiAP2 TFs, PfAP2-G and PfAP2-EXP, which bind unique DNA motifs (GTAC and TGCATGCA). We also interrogated the impact of DNA sequence and chromatin context on P. falciparum TF binding by integrating high-throughput in vitro and in vivo binding assays, DNA shape predictions, epigenetic post-translational modifications, and chromatin accessibility. We found that DNA sequence context minimally impacts binding site selection for paralogous CACACA-binding TFs, while chromatin accessibility, epigenetic patterns, co-factor recruitment, and dimerization correlate with differential binding. In contrast, GTGCAC-binding TFs prefer different DNA sequence context in addition to chromatin dynamics. Finally, we determined that TFs that preferentially bind divergent DNA motifs may bind overlapping genomic regions due to low-affinity binding to other sequence motifs. Our results demonstrate that TF binding site selection relies on a combination of DNA sequence and chromatin features, thereby contributing to the complexity of P. falciparum gene regulatory mechanisms.
Collapse
Affiliation(s)
- Victoria A Bonnell
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes Center for Malaria Research, The Pennsylvania State University, University Park, PA 16802, USA
| | - Yuning Zhang
- Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, USA
- Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA
| | - Alan S Brown
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes Center for Malaria Research, The Pennsylvania State University, University Park, PA 16802, USA
| | - John Horton
- Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, USA
| | - Gabrielle A Josling
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes Center for Malaria Research, The Pennsylvania State University, University Park, PA 16802, USA
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089, USA
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Raluca Gordân
- Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, USA
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Department of Molecular Genetics and Microbiology, Duke University, Durham, NC 27708, USA
| | - Manuel Llinás
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes Center for Malaria Research, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
16
|
Inge M, Miller R, Hook H, Bray D, Keenan J, Zhao R, Gilmore T, Siggers T. Rapid profiling of transcription factor-cofactor interaction networks reveals principles of epigenetic regulation. Nucleic Acids Res 2024; 52:10276-10296. [PMID: 39166482 PMCID: PMC11417405 DOI: 10.1093/nar/gkae706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/14/2024] [Accepted: 08/19/2024] [Indexed: 08/23/2024] Open
Abstract
Transcription factor (TF)-cofactor (COF) interactions define dynamic, cell-specific networks that govern gene expression; however, these networks are understudied due to a lack of methods for high-throughput profiling of DNA-bound TF-COF complexes. Here, we describe the Cofactor Recruitment (CoRec) method for rapid profiling of cell-specific TF-COF complexes. We define a lysine acetyltransferase (KAT)-TF network in resting and stimulated T cells. We find promiscuous recruitment of KATs for many TFs and that 35% of KAT-TF interactions are condition specific. KAT-TF interactions identify NF-κB as a primary regulator of acutely induced histone 3 lysine 27 acetylation (H3K27ac). Finally, we find that heterotypic clustering of CBP/P300-recruiting TFs is a strong predictor of total promoter H3K27ac. Our data support clustering of TF sites that broadly recruit KATs as a mechanism for widespread co-occurring histone acetylation marks. CoRec can be readily applied to different cell systems and provides a powerful approach to define TF-COF networks impacting chromatin state and gene regulation.
Collapse
Affiliation(s)
- Melissa M Inge
- Department of Biology, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
| | - Rebekah Miller
- Department of Biology, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
- Bioinformatics Program, Boston University, Boston, MA 02215, USA
| | - Heather Hook
- Department of Biology, Boston University, Boston, MA 02215, USA
| | - David Bray
- Department of Biology, Boston University, Boston, MA 02215, USA
- Bioinformatics Program, Boston University, Boston, MA 02215, USA
| | - Jessica L Keenan
- Department of Biology, Boston University, Boston, MA 02215, USA
- Bioinformatics Program, Boston University, Boston, MA 02215, USA
| | - Rose Zhao
- Department of Biology, Boston University, Boston, MA 02215, USA
| | | | - Trevor Siggers
- Department of Biology, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
- Bioinformatics Program, Boston University, Boston, MA 02215, USA
| |
Collapse
|
17
|
Jurgens JA, Matos Ruiz PM, King J, Foster EE, Berube L, Chan WM, Barry BJ, Jeong R, Rothman E, Whitman MC, MacKinnon S, Rivera-Quiles C, Pratt BM, Easterbrooks T, Mensching FM, Di Gioia SA, Pais L, England EM, de Berardinis T, Magli A, Koc F, Asakawa K, Kawakami K, O’Donnell-Luria A, Hunter DG, Robson CD, Bulyk ML, Engle EC. Gene identification for ocular congenital cranial motor neuron disorders using human sequencing, zebrafish screening, and protein binding microarrays. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.12.612713. [PMID: 39314366 PMCID: PMC11419015 DOI: 10.1101/2024.09.12.612713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Purpose To functionally evaluate novel human sequence-derived candidate genes and variants for unsolved ocular congenital cranial dysinnervation disorders (oCCDDs). Methods Through exome and genome sequencing of a genetically unsolved human oCCDD cohort, we previously identified variants in 80 strong candidate genes. Here, we further prioritized a subset of these (43 human genes, 57 zebrafish genes) using a G0 CRISPR/Cas9-based knockout assay in zebrafish and generated F2 germline mutants for seventeen. We tested the functionality of variants of uncertain significance in known and novel candidate transcription factor-encoding genes through protein binding microarrays. Results We first demonstrated the feasibility of the G0 screen by targeting known oCCDD genes phox2a and mafba. 70-90% of gene-targeted G0 zebrafish embryos recapitulated germline homozygous null-equivalent phenotypes. Using this approach, we then identified three novel candidate oCCDD genes (SEMA3F, OLIG2, and FRMD4B) with putative contributions to human and zebrafish cranial motor development. In addition, protein binding microarrays demonstrated reduced or abolished DNA binding of human variants of uncertain significance in known and novel sequence-derived transcription factors PHOX2A (p.(Trp137Cys)), MAFB (p.(Glu223Lys)), and OLIG2 (p.(Arg156Leu)). Conclusions This study nominates three strong novel candidate oCCDD genes (SEMA3F, OLIG2, and FRMD4B) and supports the functionality and putative pathogenicity of transcription factor candidate variants PHOX2A p.(Trp137Cys), MAFB p.(Glu223Lys), and OLIG2 p.(Arg156Leu). Our findings support that G0 loss-of-function screening in zebrafish can be coupled with human sequence analysis and protein binding microarrays to aid in prioritizing oCCDD candidate genes/variants.
Collapse
Affiliation(s)
- Julie A. Jurgens
- F.M. Kirby Neurobiology Center, Boston Children’s Hospital, Boston, MA, USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jessica King
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Emma E. Foster
- Department of Neurology, Boston Children’s Hospital, Boston, MA, USA
| | - Lindsay Berube
- Department of Neurology, Boston Children’s Hospital, Boston, MA, USA
| | - Wai-Man Chan
- F.M. Kirby Neurobiology Center, Boston Children’s Hospital, Boston, MA, USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Brenda J. Barry
- F.M. Kirby Neurobiology Center, Boston Children’s Hospital, Boston, MA, USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Raehoon Jeong
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
| | - Elisabeth Rothman
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Mary C. Whitman
- F.M. Kirby Neurobiology Center, Boston Children’s Hospital, Boston, MA, USA
- Department of Ophthalmology, Boston Children’s Hospital, Boston, MA, USA
- Department of Ophthalmology, Harvard Medical School, Boston, MA, USA
| | - Sarah MacKinnon
- Department of Ophthalmology, Boston Children’s Hospital, Boston, MA, USA
- Department of Ophthalmology, Harvard Medical School, Boston, MA, USA
| | | | - Brandon M. Pratt
- Department of Neurology, Boston Children’s Hospital, Boston, MA, USA
| | | | | | - Silvio Alessandro Di Gioia
- F.M. Kirby Neurobiology Center, Boston Children’s Hospital, Boston, MA, USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Regeneron Pharmaceuticals, Tarrytown, NY, USA
| | - Lynn Pais
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Eleina M. England
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Teresa de Berardinis
- Department of Ophthalmologic Sciences, Faculty of Medicine and Surgery, University “Federico II”, Naples, Italy
| | - Adriano Magli
- Department of Ophthalmologic Sciences, Faculty of Medicine and Surgery, University “Federico II”, Naples, Italy
| | - Feray Koc
- Department of Ophthalmology, Faculty of Medicine, Izmir Katip Celebi University, Izmir, Turkey
| | - Kazuhide Asakawa
- Neurobiology and Pathology Laboratory, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Koichi Kawakami
- Laboratory of Molecular and Developmental Biology, National Institute of Genetics; Department of Genetics, Graduate University for Advanced Studies (SOKENDAI)
| | - Anne O’Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - David G. Hunter
- Department of Ophthalmology, Boston Children’s Hospital, Boston, MA, USA
- Department of Ophthalmology, Harvard Medical School, Boston, MA, USA
| | - Caroline D. Robson
- Division of Neuroradiology, Department of Radiology, Boston Children’s Hospital, Boston, MA, USA
- Department of Radiology, Harvard Medical School, Boston, MA, USA
| | - Martha L. Bulyk
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Elizabeth C. Engle
- F.M. Kirby Neurobiology Center, Boston Children’s Hospital, Boston, MA, USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Department of Ophthalmology, Boston Children’s Hospital, Boston, MA, USA
- Department of Ophthalmology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
18
|
Mitra R, Li J, Sagendorf JM, Jiang Y, Cohen AS, Chiu TP, Glasscock CJ, Rohs R. Geometric deep learning of protein-DNA binding specificity. Nat Methods 2024; 21:1674-1683. [PMID: 39103447 PMCID: PMC11399107 DOI: 10.1038/s41592-024-02372-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 06/14/2024] [Indexed: 08/07/2024]
Abstract
Predicting protein-DNA binding specificity is a challenging yet essential task for understanding gene regulation. Protein-DNA complexes usually exhibit binding to a selected DNA target site, whereas a protein binds, with varying degrees of binding specificity, to a wide range of DNA sequences. This information is not directly accessible in a single structure. Here, to access this information, we present Deep Predictor of Binding Specificity (DeepPBS), a geometric deep-learning model designed to predict binding specificity from protein-DNA structure. DeepPBS can be applied to experimental or predicted structures. Interpretable protein heavy atom importance scores for interface residues can be extracted. When aggregated at the protein residue level, these scores are validated through mutagenesis experiments. Applied to designed proteins targeting specific DNA sequences, DeepPBS was demonstrated to predict experimentally measured binding specificity. DeepPBS offers a foundation for machine-aided studies that advance our understanding of molecular interactions and guide experimental designs and synthetic biology.
Collapse
Affiliation(s)
- Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jared M Sagendorf
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Yibei Jiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Ari S Cohen
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Cameron J Glasscock
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, USA.
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, USA.
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
19
|
Vo NNT, Yang A, Leesutthiphonchai W, Liu Y, Hughes TR, Judelson HS. Transcription factor binding specificities of the oomycete Phytophthora infestans reflect conserved and divergent evolutionary patterns and predict function. BMC Genomics 2024; 25:710. [PMID: 39044130 PMCID: PMC11267843 DOI: 10.1186/s12864-024-10630-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 07/17/2024] [Indexed: 07/25/2024] Open
Abstract
BACKGROUND Identifying the DNA-binding specificities of transcription factors (TF) is central to understanding gene networks that regulate growth and development. Such knowledge is lacking in oomycetes, a microbial eukaryotic lineage within the stramenopile group. Oomycetes include many important plant and animal pathogens such as the potato and tomato blight agent Phytophthora infestans, which is a tractable model for studying life-stage differentiation within the group. RESULTS Mining of the P. infestans genome identified 197 genes encoding proteins belonging to 22 TF families. Their chromosomal distribution was consistent with family expansions through unequal crossing-over, which were likely ancient since each family had similar sizes in most oomycetes. Most TFs exhibited dynamic changes in RNA levels through the P. infestans life cycle. The DNA-binding preferences of 123 proteins were assayed using protein-binding oligonucleotide microarrays, which succeeded with 73 proteins from 14 families. Binding sites predicted for representatives of the families were validated by electrophoretic mobility shift or chromatin immunoprecipitation assays. Consistent with the substantial evolutionary distance of oomycetes from traditional model organisms, only a subset of the DNA-binding preferences resembled those of human or plant orthologs. Phylogenetic analyses of the TF families within P. infestans often discriminated clades with canonical and novel DNA targets. Paralogs with similar binding preferences frequently had distinct patterns of expression suggestive of functional divergence. TFs were predicted to either drive life stage-specific expression or serve as general activators based on the representation of their binding sites within total or developmentally-regulated promoters. This projection was confirmed for one TF using synthetic and mutated promoters fused to reporter genes in vivo. CONCLUSIONS We established a large dataset of binding specificities for P. infestans TFs, representing the first in the stramenopile group. This resource provides a basis for understanding transcriptional regulation by linking TFs with their targets, which should help delineate the molecular components of processes such as sporulation and host infection. Our work also yielded insight into TF evolution during the eukaryotic radiation, revealing both functional conservation as well as diversification across kingdoms.
Collapse
Affiliation(s)
- Nguyen N T Vo
- Department of Microbiology and Plant Pathology, University of California, Riverside, CA, 92521, USA
| | - Ally Yang
- Department of Molecular Genetics and Donnelly Center, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Wiphawee Leesutthiphonchai
- Department of Microbiology and Plant Pathology, University of California, Riverside, CA, 92521, USA
- Current address: Department of Plant Pathology, Faculty of Agriculture, Kasetsart University, Bangkok, 10900, Thailand
| | - Yulong Liu
- Department of Molecular Genetics and Donnelly Center, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Timothy R Hughes
- Department of Molecular Genetics and Donnelly Center, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Howard S Judelson
- Department of Microbiology and Plant Pathology, University of California, Riverside, CA, 92521, USA.
| |
Collapse
|
20
|
Nithun RV, Yao YM, Harel O, Habiballah S, Afek A, Jbara M. Site-Specific Acetylation of the Transcription Factor Protein Max Modulates Its DNA Binding Activity. ACS CENTRAL SCIENCE 2024; 10:1295-1303. [PMID: 38947213 PMCID: PMC11212134 DOI: 10.1021/acscentsci.4c00686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Revised: 06/03/2024] [Accepted: 06/04/2024] [Indexed: 07/02/2024]
Abstract
Chemical protein synthesis provides a powerful means to prepare novel modified proteins with precision down to the atomic level, enabling an unprecedented opportunity to understand fundamental biological processes. Of particular interest is the process of gene expression, orchestrated through the interactions between transcription factors (TFs) and DNA. Here, we combined chemical protein synthesis and high-throughput screening technology to decipher the role of post-translational modifications (PTMs), e.g., Lys-acetylation on the DNA binding activity of Max TF. We synthesized a focused library of singly, doubly, and triply modified Max variants including site-specifically acetylated and fluorescently tagged analogs. The resulting synthetic analogs were employed to decipher the molecular role of Lys-acetylation on the DNA binding activity and sequence specificity of Max. We provide evidence that the acetylation sites at Lys-31 and Lys-57 significantly inhibit the DNA binding activity of Max. Furthermore, by utilizing high-throughput binding measurements, we assessed the binding activities of the modified Max variants across diverse DNA sequences. Our results indicate that acetylation marks can alter the binding specificities of Max toward certain sequences flanking its consensus binding sites. Our work provides insight into the hidden molecular code of PTM-TFs and DNA interactions, paving the way to interpret gene expression regulation programs.
Collapse
Affiliation(s)
- Raj V. Nithun
- School
of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, 69978 Israel
| | - Yumi Minyi Yao
- Department
of Chemical and Structural Biology, Weizmann
Institute of Science, Rehovot, 7610001, Israel
| | - Omer Harel
- School
of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, 69978 Israel
| | - Shaimaa Habiballah
- School
of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, 69978 Israel
| | - Ariel Afek
- Department
of Chemical and Structural Biology, Weizmann
Institute of Science, Rehovot, 7610001, Israel
| | - Muhammad Jbara
- School
of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, 69978 Israel
| |
Collapse
|
21
|
Lambourne L, Mattioli K, Santoso C, Sheynkman G, Inukai S, Kaundal B, Berenson A, Spirohn-Fitzgerald K, Bhattacharjee A, Rothman E, Shrestha S, Laval F, Yang Z, Bisht D, Sewell JA, Li G, Prasad A, Phanor S, Lane R, Campbell DM, Hunt T, Balcha D, Gebbia M, Twizere JC, Hao T, Frankish A, Riback JA, Salomonis N, Calderwood MA, Hill DE, Sahni N, Vidal M, Bulyk ML, Fuxman Bass JI. Widespread variation in molecular interactions and regulatory properties among transcription factor isoforms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.12.584681. [PMID: 38617209 PMCID: PMC11014633 DOI: 10.1101/2024.03.12.584681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Most human Transcription factors (TFs) genes encode multiple protein isoforms differing in DNA binding domains, effector domains, or other protein regions. The global extent to which this results in functional differences between isoforms remains unknown. Here, we systematically compared 693 isoforms of 246 TF genes, assessing DNA binding, protein binding, transcriptional activation, subcellular localization, and condensate formation. Relative to reference isoforms, two-thirds of alternative TF isoforms exhibit differences in one or more molecular activities, which often could not be predicted from sequence. We observed two primary categories of alternative TF isoforms: "rewirers" and "negative regulators", both of which were associated with differentiation and cancer. Our results support a model wherein the relative expression levels of, and interactions involving, TF isoforms add an understudied layer of complexity to gene regulatory networks, demonstrating the importance of isoform-aware characterization of TF functions and providing a rich resource for further studies.
Collapse
Affiliation(s)
- Luke Lambourne
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Kaia Mattioli
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Clarissa Santoso
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - Gloria Sheynkman
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sachi Inukai
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Babita Kaundal
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Anna Berenson
- Molecular Biology, Cell Biology & Biochemistry Program, Boston University, Boston, MA, USA
| | - Kerstin Spirohn-Fitzgerald
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Anukana Bhattacharjee
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Elisabeth Rothman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Florent Laval
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- TERRA Teaching and Research Centre, University of Liège, Gembloux, Belgium
- Laboratory of Viral Interactomes, GIGA Institute, University of Liège, Liège, Belgium
| | - Zhipeng Yang
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Deepa Bisht
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jared A Sewell
- Department of Biology, Boston University, Boston, MA, USA
| | - Guangyuan Li
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Anisa Prasad
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Harvard College, Cambridge MA, USA
| | - Sabrina Phanor
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Ryan Lane
- Department of Biology, Boston University, Boston, MA, USA
| | | | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Dawit Balcha
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Marinella Gebbia
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute (LTRI), Sinai Health System, Toronto, Ontario, Canada
| | - Jean-Claude Twizere
- TERRA Teaching and Research Centre, University of Liège, Gembloux, Belgium
- Laboratory of Viral Interactomes, GIGA Institute, University of Liège, Liège, Belgium
| | - Tong Hao
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Adam Frankish
- Laboratory of Viral Interactomes, GIGA Institute, University of Liège, Liège, Belgium
| | - Josh A Riback
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Nathan Salomonis
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Michael A Calderwood
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - David E Hill
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Nidhi Sahni
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Juan I Fuxman Bass
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
- Molecular Biology, Cell Biology & Biochemistry Program, Boston University, Boston, MA, USA
| |
Collapse
|
22
|
Kock KH, Kimes PK, Gisselbrecht SS, Inukai S, Phanor SK, Anderson JT, Ramakrishnan G, Lipper CH, Song D, Kurland JV, Rogers JM, Jeong R, Blacklow SC, Irizarry RA, Bulyk ML. DNA binding analysis of rare variants in homeodomains reveals homeodomain specificity-determining residues. Nat Commun 2024; 15:3110. [PMID: 38600112 PMCID: PMC11006913 DOI: 10.1038/s41467-024-47396-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 03/29/2024] [Indexed: 04/12/2024] Open
Abstract
Homeodomains (HDs) are the second largest class of DNA binding domains (DBDs) among eukaryotic sequence-specific transcription factors (TFs) and are the TF structural class with the largest number of disease-associated mutations in the Human Gene Mutation Database (HGMD). Despite numerous structural studies and large-scale analyses of HD DNA binding specificity, HD-DNA recognition is still not fully understood. Here, we analyze 92 human HD mutants, including disease-associated variants and variants of uncertain significance (VUS), for their effects on DNA binding activity. Many of the variants alter DNA binding affinity and/or specificity. Detailed biochemical analysis and structural modeling identifies 14 previously unknown specificity-determining positions, 5 of which do not contact DNA. The same missense substitution at analogous positions within different HDs often exhibits different effects on DNA binding activity. Variant effect prediction tools perform moderately well in distinguishing variants with altered DNA binding affinity, but poorly in identifying those with altered binding specificity. Our results highlight the need for biochemical assays of TF coding variants and prioritize dozens of variants for further investigations into their pathogenicity and the development of clinical diagnostics and precision therapies.
Collapse
Affiliation(s)
- Kian Hong Kock
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, USA
| | - Patrick K Kimes
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Stephen S Gisselbrecht
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Sachi Inukai
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Sabrina K Phanor
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - James T Anderson
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Gayatri Ramakrishnan
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
- Boston Bangalore Biosciences Beginnings Program, Harvard University, Cambridge, MA, USA
| | - Colin H Lipper
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Dongyuan Song
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jesse V Kurland
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Julia M Rogers
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA
| | - Raehoon Jeong
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA, USA
| | - Stephen C Blacklow
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, USA
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana Farber Cancer Institute, Boston, MA, USA
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA
| | - Rafael A Irizarry
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA.
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, USA.
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA.
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA, USA.
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
23
|
Inge MM, Miller R, Hook H, Bray D, Keenan JL, Zhao R, Gilmore TD, Siggers T. Rapid profiling of transcription factor-cofactor interaction networks reveals principles of epigenetic regulation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.05.588333. [PMID: 38617258 PMCID: PMC11014505 DOI: 10.1101/2024.04.05.588333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Transcription factor (TF)-cofactor (COF) interactions define dynamic, cell-specific networks that govern gene expression; however, these networks are understudied due to a lack of methods for high-throughput profiling of DNA-bound TF-COF complexes. Here we describe the Cofactor Recruitment (CoRec) method for rapid profiling of cell-specific TF-COF complexes. We define a lysine acetyltransferase (KAT)-TF network in resting and stimulated T cells. We find promiscuous recruitment of KATs for many TFs and that 35% of KAT-TF interactions are condition specific. KAT-TF interactions identify NF-κB as a primary regulator of acutely induced H3K27ac. Finally, we find that heterotypic clustering of CBP/P300-recruiting TFs is a strong predictor of total promoter H3K27ac. Our data supports clustering of TF sites that broadly recruit KATs as a mechanism for widespread co-occurring histone acetylation marks. CoRec can be readily applied to different cell systems and provides a powerful approach to define TF-COF networks impacting chromatin state and gene regulation.
Collapse
Affiliation(s)
- M M Inge
- Department of Biology, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
- These authors contributed equally
| | - R Miller
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
- These authors contributed equally
| | - H Hook
- Department of Biology, Boston University, Boston, MA, USA
| | - D Bray
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - J L Keenan
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - R Zhao
- Department of Biology, Boston University, Boston, MA, USA
| | - T D Gilmore
- Department of Biology, Boston University, Boston, MA, USA
| | - T Siggers
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| |
Collapse
|
24
|
Khetan S, Bulyk ML. Overlapping binding sites underlie TF genomic occupancy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.05.583629. [PMID: 38496549 PMCID: PMC10942454 DOI: 10.1101/2024.03.05.583629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Sequence-specific DNA binding by transcription factors (TFs) is a crucial step in gene regulation. However, current high-throughput in vitro approaches cannot reliably detect lower affinity TF-DNA interactions, which play key roles in gene regulation. Here, we developed PADIT-seq ( p rotein a ffinity to D NA by in vitro transcription and RNA seq uencing) to assay TF binding preferences to all 10-bp DNA sequences at far greater sensitivity than prior approaches. The expanded catalogs of low affinity DNA binding sites for the human TFs HOXD13 and EGR1 revealed that nucleotides flanking high affinity DNA binding sites create overlapping lower affinity sites that together modulate TF genomic occupancy in vivo . Formation of such extended recognition sequences stems from an inherent property of TF binding sites to interweave each other and expands the genomic sequence space for identifying noncoding variants that directly alter TF binding. One-Sentence Summary Overlapping DNA binding sites underlie TF genomic occupancy through their inherent propensity to interweave each other.
Collapse
|
25
|
Xu J, Gao J, Ni P, Gerstein M. Less-is-more: selecting transcription factor binding regions informative for motif inference. Nucleic Acids Res 2024; 52:e20. [PMID: 38214231 PMCID: PMC10899791 DOI: 10.1093/nar/gkad1240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 12/06/2023] [Accepted: 12/17/2023] [Indexed: 01/13/2024] Open
Abstract
Numerous statistical methods have emerged for inferring DNA motifs for transcription factors (TFs) from genomic regions. However, the process of selecting informative regions for motif inference remains understudied. Current approaches select regions with strong ChIP-seq signal for a given TF, assuming that such strong signal primarily results from specific interactions between the TF and its motif. Additionally, these selection approaches do not account for non-target motifs, i.e. motifs of other TFs; they presume the occurrence of these non-target motifs infrequent compared to that of the target motif, and thus assume these have minimal interference with the identification of the target. Leveraging extensive ChIP-seq datasets, we introduced the concept of TF signal 'crowdedness', referred to as C-score, for each genomic region. The C-score helps in highlighting TF signals arising from non-specific interactions. Moreover, by considering the C-score (and adjusting for the length of genomic regions), we can effectively mitigate interference of non-target motifs. Using these tools, we find that in many instances, strong ChIP-seq signal stems mainly from non-specific interactions, and the occurrence of non-target motifs significantly impacts the accurate inference of the target motif. Prioritizing genomic regions with reduced crowdedness and short length markedly improves motif inference. This 'less-is-more' effect suggests that ChIP-seq region selection warrants more attention.
Collapse
Affiliation(s)
- Jinrui Xu
- Department of Biology, Howard University, Washington, DC 20059, USA
- Center for Applied Data Science and Analytics, Howard University, Washington, DC 20059, USA
| | - Jiahao Gao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Pengyu Ni
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
26
|
Li J, Chiu TP, Rohs R. Predicting DNA structure using a deep learning method. Nat Commun 2024; 15:1243. [PMID: 38336958 PMCID: PMC10858265 DOI: 10.1038/s41467-024-45191-5] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 01/17/2024] [Indexed: 02/12/2024] Open
Abstract
Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA structure, also described as DNA shape, plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k-mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, DNA structural features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing an understanding of the effects of flanking regions on DNA structure in a target region of a sequence. The Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as versatile and powerful tool for diverse DNA structure-related studies.
Collapse
Affiliation(s)
- Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, 90089, USA.
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, 90089, USA.
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
27
|
Lavezzo GM, Lauretto MDS, Andrioli LPM, Machado-Lima A. Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites. Genet Mol Biol 2024; 46:e20230048. [PMID: 38285430 PMCID: PMC10945726 DOI: 10.1590/1678-4685-gmb-2023-0048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 10/18/2023] [Indexed: 01/30/2024] Open
Abstract
Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity.
Collapse
Affiliation(s)
- Guilherme Miura Lavezzo
- Universidade de São Paulo, Instituto de Matemática e Estatística,
Programa Interunidades de Pós-Graduação em Bioinformática, São Paulo, SP,
Brazil
| | | | | | - Ariane Machado-Lima
- Universidade de São Paulo, Escola de Artes, Ciências e Humanidades,
São Paulo, SP, Brazil
| |
Collapse
|
28
|
Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, Castro-Mondragon J, Ferenc K, Kumar V, Lemma RB, Lucas J, Chèneby J, Baranasic D, Khan A, Fornes O, Gundersen S, Johansen M, Hovig E, Lenhard B, Sandelin A, Wasserman W, Parcy F, Mathelier A. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2024; 52:D174-D182. [PMID: 37962376 PMCID: PMC10767809 DOI: 10.1093/nar/gkad1059] [Citation(s) in RCA: 285] [Impact Index Per Article: 285.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/31/2023] [Indexed: 11/15/2023] Open
Abstract
JASPAR (https://jaspar.elixir.no/) is a widely-used open-access database presenting manually curated high-quality and non-redundant DNA-binding profiles for transcription factors (TFs) across taxa. In this 10th release and 20th-anniversary update, the CORE collection has expanded with 329 new profiles. We updated three existing profiles and provided orthogonal support for 72 profiles from the previous release's UNVALIDATED collection. Altogether, the JASPAR 2024 update provides a 20% increase in CORE profiles from the previous release. A trimming algorithm enhanced profiles by removing low information content flanking base pairs, which were likely uninformative (within the capacity of the PFM models) for TFBS predictions and modelling TF-DNA interactions. This release includes enhanced metadata, featuring a refined classification for plant TFs' structural DNA-binding domains. The new JASPAR collections prompt updates to the genomic tracks of predicted TF binding sites (TFBSs) in 8 organisms, with human and mouse tracks available as native tracks in the UCSC Genome browser. All data are available through the JASPAR web interface and programmatically through its API and the updated Bioconductor and pyJASPAR packages. Finally, a new TFBS extraction tool enables users to retrieve predicted JASPAR TFBSs intersecting their genomic regions of interest.
Collapse
Affiliation(s)
- Ieva Rauluseviciute
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Rafael Riudavets-Puig
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Romain Blanc-Mathieu
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Katalin Ferenc
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Vipin Kumar
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Roza Berhanu Lemma
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Jérémy Lucas
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jeanne Chèneby
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Damir Baranasic
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta, 10000 Zagreb, Croatia
| | - Aziz Khan
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Sveinung Gundersen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Morten Johansen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
| | - Boris Lenhard
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | - Albin Sandelin
- Department of Biology and Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2200 Copenhagen N, Denmark
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - François Parcy
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| |
Collapse
|
29
|
Mitra R, Li J, Sagendorf JM, Jiang Y, Chiu TP, Rohs R. DeepPBS: Geometric deep learning for interpretable prediction of protein-DNA binding specificity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.15.571942. [PMID: 38293168 PMCID: PMC10827229 DOI: 10.1101/2023.12.15.571942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Predicting specificity in protein-DNA interactions is a challenging yet essential task for understanding gene regulation. Here, we present Deep Predictor of Binding Specificity (DeepPBS), a geometric deep-learning model designed to predict binding specificity across protein families based on protein-DNA structures. The DeepPBS architecture allows investigation of different family-specific recognition patterns. DeepPBS can be applied to predicted structures, and can aid in the modeling of protein-DNA complexes. DeepPBS is interpretable and can be used to calculate protein heavy atom-level importance scores, demonstrated as a case-study on p53-DNA interface. When aggregated at the protein residue level, these scores conform well with alanine scanning mutagenesis experimental data. The inference time for DeepPBS is sufficiently fast for analyzing simulation trajectories, as demonstrated on a molecular-dynamics simulation of a Drosophila Hox-DNA tertiary complex with its cofactor. DeepPBS and its corresponding data resources offer a foundation for machine-aided protein-DNA interaction studies, guiding experimental choices and complex design, as well as advancing our understanding of molecular interactions.
Collapse
|
30
|
Martin V, Zhuang F, Zhang Y, Pinheiro K, Gordân R. High-throughput data and modeling reveal insights into the mechanisms of cooperative DNA-binding by transcription factor proteins. Nucleic Acids Res 2023; 51:11600-11612. [PMID: 37889068 PMCID: PMC10681739 DOI: 10.1093/nar/gkad872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 09/21/2023] [Accepted: 10/05/2023] [Indexed: 10/28/2023] Open
Abstract
Cooperative DNA-binding by transcription factor (TF) proteins is critical for eukaryotic gene regulation. In the human genome, many regulatory regions contain TF-binding sites in close proximity to each other, which can facilitate cooperative interactions. However, binding site proximity does not necessarily imply cooperative binding, as TFs can also bind independently to each of their neighboring target sites. Currently, the rules that drive cooperative TF binding are not well understood. In addition, it is oftentimes difficult to infer direct TF-TF cooperativity from existing DNA-binding data. Here, we show that in vitro binding assays using DNA libraries of a few thousand genomic sequences with putative cooperative TF-binding events can be used to develop accurate models of cooperativity and to gain insights into cooperative binding mechanisms. Using factors ETS1 and RUNX1 as our case study, we show that the distance and orientation between ETS1 sites are critical determinants of cooperative ETS1-ETS1 binding, while cooperative ETS1-RUNX1 interactions show more flexibility in distance and orientation and can be accurately predicted based on the affinity and sequence/shape features of the binding sites. The approach described here, combining custom experimental design with machine-learning modeling, can be easily applied to study the cooperative DNA-binding patterns of any TFs.
Collapse
Affiliation(s)
- Vincentius Martin
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
| | - Farica Zhuang
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
| | - Yuning Zhang
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
- Program in Computational Biology & Bioinformatics, Durham, NC 27708, USA
| | - Kyle Pinheiro
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
| | - Raluca Gordân
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
- Department of Biostatistics & Bioinformatics, Department of Molecular Genetics and Microbiology, Department of Cell Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
31
|
Nithun RV, Yao YM, Lin X, Habiballah S, Afek A, Jbara M. Deciphering the Role of the Ser-Phosphorylation Pattern on the DNA-Binding Activity of Max Transcription Factor Using Chemical Protein Synthesis. Angew Chem Int Ed Engl 2023; 62:e202310913. [PMID: 37642402 DOI: 10.1002/anie.202310913] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 08/25/2023] [Accepted: 08/29/2023] [Indexed: 08/31/2023]
Abstract
The chemical synthesis of site-specifically modified transcription factors (TFs) is a powerful method to investigate how post-translational modifications (PTMs) influence TF-DNA interactions and impact gene expression. Among these TFs, Max plays a pivotal role in controlling the expression of 15 % of the genome. The activity of Max is regulated by PTMs; Ser-phosphorylation at the N-terminus is considered one of the key regulatory mechanisms. In this study, we developed a practical synthetic strategy to prepare homogeneous full-length Max for the first time, to explore the impact of Max phosphorylation. We prepared a focused library of eight Max variants, with distinct modification patterns, including mono-phosphorylated, and doubly phosphorylated analogues at Ser2/Ser11 as well as fluorescently labeled variants through native chemical ligation. Through comprehensive DNA binding analyses, we discovered that the phosphorylation position plays a crucial role in the DNA-binding activity of Max. Furthermore, in vitro high-throughput analysis using DNA microarrays revealed that the N-terminus phosphorylation pattern does not interfere with the DNA sequence specificity of Max. Our work provides insights into the regulatory role of Max's phosphorylation on the DNA interactions and sequence specificity, shedding light on how PTMs influence TF function.
Collapse
Affiliation(s)
- Raj V Nithun
- School of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yumi Minyi Yao
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Xiaoxi Lin
- School of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Shaimaa Habiballah
- School of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Ariel Afek
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Muhammad Jbara
- School of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, 69978, Israel
| |
Collapse
|
32
|
Jindal GA, Bantle AT, Solvason JJ, Grudzien JL, D'Antonio-Chronowska A, Lim F, Le SH, Song BP, Ragsac MF, Klie A, Larsen RO, Frazer KA, Farley EK. Single-nucleotide variants within heart enhancers increase binding affinity and disrupt heart development. Dev Cell 2023; 58:2206-2216.e5. [PMID: 37848026 PMCID: PMC10720985 DOI: 10.1016/j.devcel.2023.09.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 06/07/2023] [Accepted: 09/20/2023] [Indexed: 10/19/2023]
Abstract
Transcriptional enhancers direct precise gene expression patterns during development and harbor the majority of variants associated with phenotypic diversity, evolutionary adaptations, and disease. Pinpointing which enhancer variants contribute to changes in gene expression and phenotypes is a major challenge. Here, we find that suboptimal or low-affinity binding sites are necessary for precise gene expression during heart development. Single-nucleotide variants (SNVs) can optimize the affinity of ETS binding sites, causing gain-of-function (GOF) gene expression, cell migration defects, and phenotypes as severe as extra beating hearts in the marine chordate Ciona robusta. In human induced pluripotent stem cell (iPSC)-derived cardiomyocytes, a SNV within a human GATA4 enhancer increases ETS binding affinity and causes GOF enhancer activity. The prevalence of suboptimal-affinity sites within enhancers creates a vulnerability whereby affinity-optimizing SNVs can lead to GOF gene expression, changes in cellular identity, and organismal-level phenotypes that could contribute to the evolution of novel traits or diseases.
Collapse
Affiliation(s)
- Granton A Jindal
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA
| | - Alexis T Bantle
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Biological Sciences Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Joe J Solvason
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Jessica L Grudzien
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA
| | | | - Fabian Lim
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Biological Sciences Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Sophia H Le
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA
| | - Benjamin P Song
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Biological Sciences Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Michelle F Ragsac
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Adam Klie
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Reid O Larsen
- Biomedical Sciences Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Kelly A Frazer
- Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA 92093, USA; Institute for Genomic Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA
| | - Emma K Farley
- Department of Medicine, Health Sciences, University of California, San Diego, La Jolla, CA 92093, USA; Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
33
|
Klie A, Laub D, Talwar JV, Stites H, Jores T, Solvason JJ, Farley EK, Carter H. Predictive analyses of regulatory sequences with EUGENe. NATURE COMPUTATIONAL SCIENCE 2023; 3:946-956. [PMID: 38177592 PMCID: PMC10768637 DOI: 10.1038/s43588-023-00544-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 09/27/2023] [Indexed: 01/06/2024]
Abstract
Deep learning has become a popular tool to study cis-regulatory function. Yet efforts to design software for deep-learning analyses in regulatory genomics that are findable, accessible, interoperable and reusable (FAIR) have fallen short of fully meeting these criteria. Here we present elucidating the utility of genomic elements with neural nets (EUGENe), a FAIR toolkit for the analysis of genomic sequences with deep learning. EUGENe consists of a set of modules and subpackages for executing the key functionality of a genomics deep learning workflow: (1) extracting, transforming and loading sequence data from many common file formats; (2) instantiating, initializing and training diverse model architectures; and (3) evaluating and interpreting model behavior. We designed EUGENe as a simple, flexible and extensible interface for streamlining and customizing end-to-end deep-learning sequence analyses, and illustrate these principles through application of the toolkit to three predictive modeling tasks. We hope that EUGENe represents a springboard towards a collaborative ecosystem for deep-learning applications in genomics research.
Collapse
Affiliation(s)
- Adam Klie
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - David Laub
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - James V Talwar
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | | | - Tobias Jores
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Joe J Solvason
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Department of Molecular Biology, University of California San Diego, La Jolla, CA, USA
| | - Emma K Farley
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Department of Molecular Biology, University of California San Diego, La Jolla, CA, USA
| | - Hannah Carter
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
34
|
Stevenson MJ, Phanor SK, Patel U, Gisselbrecht SS, Bulyk ML, O'Brien LL. Altered binding affinity of SIX1-Q177R correlates with enhanced WNT5A and WNT pathway effector expression in Wilms tumor. Dis Model Mech 2023; 16:dmm050208. [PMID: 37815464 PMCID: PMC10668032 DOI: 10.1242/dmm.050208] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 09/27/2023] [Indexed: 10/11/2023] Open
Abstract
Wilms tumors present as an amalgam of varying proportions of tissues located within the developing kidney, one being the nephrogenic blastema comprising multipotent nephron progenitor cells (NPCs). The recurring missense mutation Q177R in NPC transcription factors SIX1 and SIX2 is most correlated with tumors of blastemal histology and is significantly associated with relapse. Yet, the transcriptional regulatory consequences of SIX1/2-Q177R that might promote tumor progression and recurrence have not been investigated extensively. Utilizing multiple Wilms tumor transcriptomic datasets, we identified upregulation of the gene encoding non-canonical WNT ligand WNT5A in addition to other WNT pathway effectors in SIX1/2-Q177R mutant tumors. SIX1 ChIP-seq datasets from Wilms tumors revealed shared binding sites for SIX1/SIX1-Q177R within a promoter of WNT5A and at putative distal cis-regulatory elements (CREs). We demonstrate colocalization of SIX1 and WNT5A in Wilms tumor tissue and utilize in vitro assays that support SIX1 and SIX1-Q177R activation of expression from the WNT5A CREs, as well as enhanced binding affinity within the WNT5A promoter that may promote the differential expression of WNT5A and other WNT pathway effectors associated with SIX1-Q177R tumors.
Collapse
Affiliation(s)
- Matthew J. Stevenson
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Sabrina K. Phanor
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Urvi Patel
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Stephen S. Gisselbrecht
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Martha L. Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Lori L. O'Brien
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
35
|
Li J, Chiu TP, Rohs R. Deep DNAshape: Predicting DNA shape considering extended flanking regions using a deep learning method. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.22.563383. [PMID: 37961633 PMCID: PMC10634709 DOI: 10.1101/2023.10.22.563383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA shape plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k -mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, refined DNA shape features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing a deeper understanding of the effects of flanking regions on DNA shape in a target region of a sequence. Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as a versatile and powerful tool for diverse DNA structure-related studies.
Collapse
|
36
|
Horton CA, Alexandari AM, Hayes MGB, Marklund E, Schaepe JM, Aditham AK, Shah N, Suzuki PH, Shrikumar A, Afek A, Greenleaf WJ, Gordân R, Zeitlinger J, Kundaje A, Fordyce PM. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Science 2023; 381:eadd1250. [PMID: 37733848 DOI: 10.1126/science.add1250] [Citation(s) in RCA: 86] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/26/2023] [Indexed: 09/23/2023]
Abstract
Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.
Collapse
Affiliation(s)
- Connor A Horton
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Michael G B Hayes
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Emil Marklund
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Julia M Schaepe
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Arjun K Aditham
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
| | - Nilay Shah
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Peter H Suzuki
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ariel Afek
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | | | - Raluca Gordân
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
- The University of Kansas Medical Center, Kansas City, KS 66103, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Polly M Fordyce
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94110, USA
| |
Collapse
|
37
|
Glasscock CJ, Pecoraro R, McHugh R, Doyle LA, Chen W, Boivin O, Lonnquist B, Na E, Politanska Y, Haddox HK, Cox D, Norn C, Coventry B, Goreshnik I, Vafeados D, Lee GR, Gordan R, Stoddard BL, DiMaio F, Baker D. Computational design of sequence-specific DNA-binding proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.20.558720. [PMID: 37790440 PMCID: PMC10542524 DOI: 10.1101/2023.09.20.558720] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Sequence-specific DNA-binding proteins (DBPs) play critical roles in biology and biotechnology, and there has been considerable interest in the engineering of DBPs with new or altered specificities for genome editing and other applications. While there has been some success in reprogramming naturally occurring DBPs using selection methods, the computational design of new DBPs that recognize arbitrary target sites remains an outstanding challenge. We describe a computational method for the design of small DBPs that recognize specific target sequences through interactions with bases in the major groove, and employ this method in conjunction with experimental screening to generate binders for 5 distinct DNA targets. These binders exhibit specificity closely matching the computational models for the target DNA sequences at as many as 6 base positions and affinities as low as 30-100 nM. The crystal structure of a designed DBP-target site complex is in close agreement with the design model, highlighting the accuracy of the design method. The designed DBPs function in both Escherichia coli and mammalian cells to repress and activate transcription of neighboring genes. Our method is a substantial step towards a general route to small and hence readily deliverable sequence-specific DBPs for gene regulation and editing.
Collapse
Affiliation(s)
- Cameron J. Glasscock
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Robert Pecoraro
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Physics, University of Washington, Seattle, WA, USA
| | - Ryan McHugh
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Lindsey A. Doyle
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Wei Chen
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Olivier Boivin
- Program in Genetics and Genomic, Duke University, Durham, NC, USA
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
| | - Beau Lonnquist
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - Emily Na
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Yuliya Politanska
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Hugh K. Haddox
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - David Cox
- Department of Biochemistry, Stanford University School of Medicine, Palo Alto, CA USA
- Department of Medicine, Division of Hematology, Stanford University, Stanford, CA, USA
| | - Christoffer Norn
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- BioInnovation Institute, DK2200 Copenhagen N, Denmark
| | - Brian Coventry
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Inna Goreshnik
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Dionne Vafeados
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Gyu Rie Lee
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA USA
| | - Raluca Gordan
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biostatistics and Bioinformatics, Department of Computer Science, Department of Molecular Genetics and Microbiology, Duke University, Durham, NC, USA
| | - Barry L. Stoddard
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- BioInnovation Institute, DK2200 Copenhagen N, Denmark
| |
Collapse
|
38
|
Tognon M, Giugno R, Pinello L. A survey on algorithms to characterize transcription factor binding sites. Brief Bioinform 2023; 24:bbad156. [PMID: 37099664 PMCID: PMC10422928 DOI: 10.1093/bib/bbad156] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 03/27/2023] [Accepted: 04/01/2023] [Indexed: 04/28/2023] Open
Abstract
Transcription factors (TFs) are key regulatory proteins that control the transcriptional rate of cells by binding short DNA sequences called transcription factor binding sites (TFBS) or motifs. Identifying and characterizing TFBS is fundamental to understanding the regulatory mechanisms governing the transcriptional state of cells. During the last decades, several experimental methods have been developed to recover DNA sequences containing TFBS. In parallel, computational methods have been proposed to discover and identify TFBS motifs based on these DNA sequences. This is one of the most widely investigated problems in bioinformatics and is referred to as the motif discovery problem. In this manuscript, we review classical and novel experimental and computational methods developed to discover and characterize TFBS motifs in DNA sequences, highlighting their advantages and drawbacks. We also discuss open challenges and future perspectives that could fill the remaining gaps in the field.
Collapse
Affiliation(s)
- Manuel Tognon
- Computer Science Department, University of Verona, Verona, Italy
- Molecular Pathology Unit, Center for Computational and Integrative Biology and Center for Cancer Research, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Rosalba Giugno
- Computer Science Department, University of Verona, Verona, Italy
| | - Luca Pinello
- Molecular Pathology Unit, Center for Computational and Integrative Biology and Center for Cancer Research, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Department of Pathology, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
39
|
Alexandari AM, Horton CA, Shrikumar A, Shah N, Li E, Weilert M, Pufall MA, Zeitlinger J, Fordyce PM, Kundaje A. De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.11.540401. [PMID: 37214836 PMCID: PMC10197627 DOI: 10.1101/2023.05.11.540401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences, in vivo genomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF binding, both intrinsic and context-specific, is essential to understand gene regulation and the impact of regulatory, non-coding genetic variation. Biophysical models trained on in vitro TF binding assays can estimate intrinsic affinity landscapes and predict occupancy based on TF concentration and affinity. However, these models cannot adequately explain context-specific, in vivo binding profiles. Conversely, deep learning models, trained on in vivo TF binding assays, effectively predict and explain genomic occupancy profiles as a function of complex regulatory sequence syntax, albeit without a clear biophysical interpretation. To reconcile these complementary models of in vitro and in vivo TF binding, we developed Affinity Distillation (AD), a method that extracts thermodynamic affinities de-novo from deep learning models of TF chromatin immunoprecipitation (ChIP) experiments by marginalizing away the influence of genomic sequence context. Applied to neural networks modeling diverse classes of yeast and mammalian TFs, AD predicts energetic impacts of sequence variation within and surrounding motifs on TF binding as measured by diverse in vitro assays with superior dynamic range and accuracy compared to motif-based methods. Furthermore, AD can accurately discern affinities of TF paralogs. Our results highlight thermodynamic affinity as a key determinant of in vivo binding, suggest that deep learning models of in vivo binding implicitly learn high-resolution affinity landscapes, and show that these affinities can be successfully distilled using AD. This new biophysical interpretation of deep learning models enables high-throughput in silico experiments to explore the influence of sequence context and variation on both intrinsic affinity and in vivo occupancy.
Collapse
Affiliation(s)
- Amr M. Alexandari
- Department of Computer Science, Stanford University, Stanford, CA 94305
| | | | - Avanti Shrikumar
- Department of Earth System Science, Stanford University, Stanford, CA 94305
| | - Nilay Shah
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Eileen Li
- Department of Genetics, Stanford University, Stanford, CA 94305
| | - Melanie Weilert
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Miles A. Pufall
- Department of Biochemistry, Carver College of Medicine, University of Iowa, Iowa City, Iowa 52242, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO, USA
- The University of Kansas Medical Center, Kansas City, KS, USA
| | - Polly M. Fordyce
- Department of Genetics, Stanford University, Stanford, CA 94305
- Department of Bioengineering, Stanford University, Stanford, CA 94305
- ChEM-H Institute, Stanford University, Stanford, CA 94305
- Chan Zuckerberg Biohub, San Francisco, CA 94110
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA 94305
- Department of Genetics, Stanford University, Stanford, CA 94305
| |
Collapse
|
40
|
Cheng H, Liu L, Zhou Y, Deng K, Ge Y, Hu X. TSPTFBS 2.0: trans-species prediction of transcription factor binding sites and identification of their core motifs in plants. FRONTIERS IN PLANT SCIENCE 2023; 14:1175837. [PMID: 37229121 PMCID: PMC10203575 DOI: 10.3389/fpls.2023.1175837] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 04/13/2023] [Indexed: 05/27/2023]
Abstract
Introduction An emerging approach using promoter tiling deletion via genome editing is beginning to become popular in plants. Identifying the precise positions of core motifs within plant gene promoter is of great demand but they are still largely unknown. We previously developed TSPTFBS of 265 Arabidopsis transcription factor binding sites (TFBSs) prediction models, which now cannot meet the above demand of identifying the core motif. Methods Here, we additionally introduced 104 maize and 20 rice TFBS datasets and utilized DenseNet for model construction on a large-scale dataset of a total of 389 plant TFs. More importantly, we combined three biological interpretability methods including DeepLIFT, in-silico tiling deletion, and in-silico mutagenesis to identify the potential core motifs of any given genomic region. Results For the results, DenseNet not only has achieved greater predictability than baseline methods such as LS-GKM and MEME for above 389 TFs from Arabidopsis, maize and rice, but also has greater performance on trans-species prediction of a total of 15 TFs from other six plant species. A motif analysis based on TF-MoDISco and global importance analysis (GIA) further provide the biological implication of the core motif identified by three interpretability methods. Finally, we developed a pipeline of TSPTFBS 2.0, which integrates 389 DenseNet-based models of TF binding and the above three interpretability methods. Discussion TSPTFBS 2.0 was implemented as a user-friendly web-server (http://www.hzau-hulab.com/TSPTFBS/), which can support important references for editing targets of any given plant promoters and it has great potentials to provide reliable editing target of genetic screen experiments in plants.
Collapse
|
41
|
Li M, Yao T, Lin W, Hinckley WE, Galli M, Muchero W, Gallavotti A, Chen JG, Huang SSC. Double DAP-seq uncovered synergistic DNA binding of interacting bZIP transcription factors. Nat Commun 2023; 14:2600. [PMID: 37147307 PMCID: PMC10163045 DOI: 10.1038/s41467-023-38096-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 04/15/2023] [Indexed: 05/07/2023] Open
Abstract
Many eukaryotic transcription factors (TF) form homodimer or heterodimer complexes to regulate gene expression. Dimerization of BASIC LEUCINE ZIPPER (bZIP) TFs are critical for their functions, but the molecular mechanism underlying the DNA binding and functional specificity of homo- versus heterodimers remains elusive. To address this gap, we present the double DNA Affinity Purification-sequencing (dDAP-seq) technique that maps heterodimer binding sites on endogenous genomic DNA. Using dDAP-seq we profile twenty pairs of C/S1 bZIP heterodimers and S1 homodimers in Arabidopsis and show that heterodimerization significantly expands the DNA binding preferences of these TFs. Analysis of dDAP-seq binding sites reveals the function of bZIP9 in abscisic acid response and the role of bZIP53 heterodimer-specific binding in seed maturation. The C/S1 heterodimers show distinct preferences for the ACGT elements recognized by plant bZIPs and motifs resembling the yeast GCN4 cis-elements. This study demonstrates the potential of dDAP-seq in deciphering the DNA binding specificities of interacting TFs that are key for combinatorial gene regulation.
Collapse
Affiliation(s)
- Miaomiao Li
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA
| | - Tao Yao
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Wanru Lin
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA
| | - Will E Hinckley
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA
| | - Mary Galli
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ, 08854-8020, USA
| | - Wellington Muchero
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Andrea Gallavotti
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ, 08854-8020, USA
| | - Jin-Gui Chen
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Shao-Shan Carol Huang
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA.
| |
Collapse
|
42
|
Mielko Z, Zhang Y, Sahay H, Liu Y, Schaich MA, Schnable B, Morrison AM, Burdinski D, Adar S, Pufall M, Van Houten B, Gordân R, Afek A. UV irradiation remodels the specificity landscape of transcription factors. Proc Natl Acad Sci U S A 2023; 120:e2217422120. [PMID: 36888663 PMCID: PMC10089200 DOI: 10.1073/pnas.2217422120] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 02/09/2023] [Indexed: 03/09/2023] Open
Abstract
Somatic mutations are highly enriched at transcription factor (TF) binding sites, with the strongest trend being observed for ultraviolet light (UV)-induced mutations in melanomas. One of the main mechanisms proposed for this hypermutation pattern is the inefficient repair of UV lesions within TF-binding sites, caused by competition between TFs bound to these lesions and the DNA repair proteins that must recognize the lesions to initiate repair. However, TF binding to UV-irradiated DNA is poorly characterized, and it is unclear whether TFs maintain specificity for their DNA sites after UV exposure. We developed UV-Bind, a high-throughput approach to investigate the impact of UV irradiation on protein-DNA binding specificity. We applied UV-Bind to ten TFs from eight structural families, and found that UV lesions significantly altered the DNA-binding preferences of all the TFs tested. The main effect was a decrease in binding specificity, but the precise effects and their magnitude differ across factors. Importantly, we found that despite the overall reduction in DNA-binding specificity in the presence of UV lesions, TFs can still compete with repair proteins for lesion recognition, in a manner consistent with their specificity for UV-irradiated DNA. In addition, for a subset of TFs, we identified a surprising but reproducible effect at certain nonconsensus DNA sequences, where UV irradiation leads to a high increase in the level of TF binding. These changes in DNA-binding specificity after UV irradiation, at both consensus and nonconsensus sites, have important implications for the regulatory and mutagenic roles of TFs in the cell.
Collapse
Affiliation(s)
- Zachery Mielko
- Program in Genetics and Genomics, Duke University School of Medicine, Durham, NC 27708
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27708
- Department of Computer Science, Duke University, Durham, NC 27708
| | - Yuning Zhang
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27708
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27708
| | - Harshit Sahay
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27708
- Program in Computational Biology and Bioinformatics, Duke University School of Medicine, Durham NC 27708
| | - Yiling Liu
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27708
- Program in Computational Biology and Bioinformatics, Duke University School of Medicine, Durham NC 27708
| | - Matthew A Schaich
- Department of Pharmacology and Chemical Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213
- UPMC-Hillman Cancer Center, Pittsburgh, PA 15213
| | - Brittani Schnable
- UPMC-Hillman Cancer Center, Pittsburgh, PA 15213
- Molecular Genetics and Developmental Biology Graduate Program, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213
| | - Abigail M Morrison
- Department of Biochemistry and Molecular Biology, Carver College of Medicine, University of Iowa, Iowa City, IA 52242
| | - Debbie Burdinski
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Sheera Adar
- Department of Microbiology and Molecular Genetics, The Institute for Medical Research Israel-Canada, The Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Miles Pufall
- Department of Biochemistry and Molecular Biology, Carver College of Medicine, University of Iowa, Iowa City, IA 52242
- Holden Comprehensive Cancer Center, University of Iowa, Iowa City, IA 52242
| | - Bennett Van Houten
- Program in Computational Biology and Bioinformatics, Duke University School of Medicine, Durham NC 27708
- Department of Pharmacology and Chemical Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213
- UPMC-Hillman Cancer Center, Pittsburgh, PA 15213
- Molecular Biophysics and Structural Biology Program, University of Pittsburgh, Pittsburgh, PA 15213
| | - Raluca Gordân
- Department of Computer Science, Duke University, Durham, NC 27708
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27708
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27708
| | - Ariel Afek
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
43
|
Carrasco Pro S, Hook H, Bray D, Berenzy D, Moyer D, Yin M, Labadorf AT, Tewhey R, Siggers T, Fuxman Bass JI. Widespread perturbation of ETS factor binding sites in cancer. Nat Commun 2023; 14:913. [PMID: 36808133 PMCID: PMC9938127 DOI: 10.1038/s41467-023-36535-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 02/03/2023] [Indexed: 02/19/2023] Open
Abstract
Although >90% of somatic mutations reside in non-coding regions, few have been reported as cancer drivers. To predict driver non-coding variants (NCVs), we present a transcription factor (TF)-aware burden test based on a model of coherent TF function in promoters. We apply this test to NCVs from the Pan-Cancer Analysis of Whole Genomes cohort and predict 2555 driver NCVs in the promoters of 813 genes across 20 cancer types. These genes are enriched in cancer-related gene ontologies, essential genes, and genes associated with cancer prognosis. We find that 765 candidate driver NCVs alter transcriptional activity, 510 lead to differential binding of TF-cofactor regulatory complexes, and that they primarily impact the binding of ETS factors. Finally, we show that different NCVs within a promoter often affect transcriptional activity through shared mechanisms. Our integrated computational and experimental approach shows that cancer NCVs are widespread and that ETS factors are commonly disrupted.
Collapse
Affiliation(s)
| | - Heather Hook
- Department of Biology, Boston University, Boston, MA, USA
| | - David Bray
- Bioinformatics Program, Boston University, Boston, MA, USA
| | | | - Devlin Moyer
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - Meimei Yin
- Department of Biology, Boston University, Boston, MA, USA
| | - Adam Thomas Labadorf
- Bioinformatics Hub, Boston University, Boston, MA, USA
- Boston University School of Medicine, Department of Neurology, Boston, MA, USA
| | | | - Trevor Siggers
- Bioinformatics Program, Boston University, Boston, MA, USA.
- Department of Biology, Boston University, Boston, MA, USA.
- Biological Design Center, Boston University, Boston, MA, USA.
| | - Juan Ignacio Fuxman Bass
- Bioinformatics Program, Boston University, Boston, MA, USA.
- Department of Biology, Boston University, Boston, MA, USA.
| |
Collapse
|
44
|
Using Restriction Endonuclease, Protection, Selection, and Amplification to Identify Preferred DNA-Binding Sequences of Microbial Transcription Factors. Microbiol Spectr 2023; 11:e0439722. [PMID: 36602370 PMCID: PMC9927371 DOI: 10.1128/spectrum.04397-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Regulation of gene expression is a vital component of cellular biology. Transcription factor proteins often bind regulatory DNA sequences upstream of transcription start sites to facilitate the activation or repression of RNA polymerase. Research laboratories have devoted many projects to understanding the transcription regulatory networks for transcription factors, as these regulated genes provide critical insight into the biology of the host organism. Various in vivo and in vitro assays have been developed to elucidate transcription regulatory networks. Several assays, including SELEX-seq and ChIP-seq, capture DNA-bound transcription factors to determine the preferred DNA-binding sequences, which can then be mapped to the host organism's genome to identify candidate regulatory genes. In this protocol, we describe an alternative in vitro, iterative selection approach to ascertaining DNA-binding sequences of a transcription factor of interest using restriction endonuclease, protection, selection, and amplification (REPSA). Contrary to traditional antibody-based capture methods, REPSA selects for transcription factor-bound DNA sequences by challenging binding reactions with a type IIS restriction endonuclease. Cleavage-resistant DNA species are amplified by PCR and then used as inputs for the next round of REPSA. This process is repeated until a protected DNA species is observed by gel electrophoresis, which is an indication of a successful REPSA experiment. Subsequent high-throughput sequencing of REPSA-selected DNAs accompanied by motif discovery and scanning analyses can be used for determining transcription factor consensus binding sequences and potential regulated genes, providing critical first steps in determining organisms' transcription regulatory networks. IMPORTANCE Transcription regulatory proteins are an essential class of proteins that help maintain cellular homeostasis by adapting the transcriptome based on environmental cues. Dysregulation of transcription factors can lead to diseases such as cancer, and many eukaryotic and prokaryotic transcription factors have become enticing therapeutic targets. Additionally, in many understudied organisms, the transcription regulatory networks for uncharacterized transcription factors remain unknown. As such, the need for experimental techniques to establish transcription regulatory networks is paramount. Here, we describe a step-by-step protocol for REPSA, an inexpensive, iterative selection technique to identify transcription factor-binding sequences without the need for antibody-based capture methods.
Collapse
|
45
|
Becht DC, Klein BJ, Kanai A, Jang SM, Cox KL, Zhou BR, Phanor SK, Zhang Y, Chen RW, Ebmeier CC, Lachance C, Galloy M, Fradet-Turcotte A, Bulyk ML, Bai Y, Poirier MG, Côté J, Yokoyama A, Kutateladze TG. MORF and MOZ acetyltransferases target unmethylated CpG islands through the winged helix domain. Nat Commun 2023; 14:697. [PMID: 36754959 PMCID: PMC9908889 DOI: 10.1038/s41467-023-36368-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 01/26/2023] [Indexed: 02/10/2023] Open
Abstract
Human acetyltransferases MOZ and MORF are implicated in chromosomal translocations associated with aggressive leukemias. Oncogenic translocations involve the far amino terminus of MOZ/MORF, the function of which remains unclear. Here, we identified and characterized two structured winged helix (WH) domains, WH1 and WH2, in MORF and MOZ. WHs bind DNA in a cooperative manner, with WH1 specifically recognizing unmethylated CpG sequences. Structural and genomic analyses show that the DNA binding function of WHs targets MORF/MOZ to gene promoters, stimulating transcription and H3K23 acetylation, and WH1 recruits oncogenic fusions to HOXA genes that trigger leukemogenesis. Cryo-EM, NMR, mass spectrometry and mutagenesis studies provide mechanistic insight into the DNA-binding mechanism, which includes the association of WH1 with the CpG-containing linker DNA and binding of WH2 to the dyad of the nucleosome. The discovery of WHs in MORF and MOZ and their DNA binding functions could open an avenue in developing therapeutics to treat diseases associated with aberrant MOZ/MORF acetyltransferase activities.
Collapse
Affiliation(s)
- Dustin C Becht
- Department of Pharmacology, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Brianna J Klein
- Department of Pharmacology, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Akinori Kanai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwa, Chiba, 277-0882, Japan
| | - Suk Min Jang
- Laval University Cancer Research Center, CHU de Québec-UL Research Center-Oncology Division, Quebec City, QC, G1R 3S3, Canada
| | - Khan L Cox
- Department of Physics, Ohio State University, Columbus, OH, 43210, USA
| | - Bing-Rui Zhou
- Laboratory of Biochemistry and Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Sabrina K Phanor
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Yi Zhang
- Department of Pharmacology, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Ruo-Wen Chen
- Department of Physics, Ohio State University, Columbus, OH, 43210, USA
| | | | - Catherine Lachance
- Laval University Cancer Research Center, CHU de Québec-UL Research Center-Oncology Division, Quebec City, QC, G1R 3S3, Canada
| | - Maxime Galloy
- Laval University Cancer Research Center, CHU de Québec-UL Research Center-Oncology Division, Quebec City, QC, G1R 3S3, Canada
| | - Amelie Fradet-Turcotte
- Laval University Cancer Research Center, CHU de Québec-UL Research Center-Oncology Division, Quebec City, QC, G1R 3S3, Canada
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Yawen Bai
- Laboratory of Biochemistry and Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Michael G Poirier
- Department of Physics, Ohio State University, Columbus, OH, 43210, USA
| | - Jacques Côté
- Laval University Cancer Research Center, CHU de Québec-UL Research Center-Oncology Division, Quebec City, QC, G1R 3S3, Canada.
| | - Akihiko Yokoyama
- Tsuruoka Metabolomics Laboratory, National Cancer Center, Tsuruoka, Yamagata, 997-0052, Japan.
| | - Tatiana G Kutateladze
- Department of Pharmacology, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| |
Collapse
|
46
|
Paul I, Bolzan D, Youssef A, Gagnon KA, Hook H, Karemore G, Oliphant MUJ, Lin W, Liu Q, Phanse S, White C, Padhorny D, Kotelnikov S, Chen CS, Hu P, Denis GV, Kozakov D, Raught B, Siggers T, Wuchty S, Muthuswamy SK, Emili A. Parallelized multidimensional analytic framework applied to mammary epithelial cells uncovers regulatory principles in EMT. Nat Commun 2023; 14:688. [PMID: 36755019 PMCID: PMC9908882 DOI: 10.1038/s41467-023-36122-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 01/17/2023] [Indexed: 02/10/2023] Open
Abstract
A proper understanding of disease etiology will require longitudinal systems-scale reconstruction of the multitiered architecture of eukaryotic signaling. Here we combine state-of-the-art data acquisition platforms and bioinformatics tools to devise PAMAF, a workflow that simultaneously examines twelve omics modalities, i.e., protein abundance from whole-cells, nucleus, exosomes, secretome and membrane; N-glycosylation, phosphorylation; metabolites; mRNA, miRNA; and, in parallel, single-cell transcriptomes. We apply PAMAF in an established in vitro model of TGFβ-induced epithelial to mesenchymal transition (EMT) to quantify >61,000 molecules from 12 omics and 10 timepoints over 12 days. Bioinformatics analysis of this EMT-ExMap resource allowed us to identify; -topological coupling between omics, -four distinct cell states during EMT, -omics-specific kinetic paths, -stage-specific multi-omics characteristics, -distinct regulatory classes of genes, -ligand-receptor mediated intercellular crosstalk by integrating scRNAseq and subcellular proteomics, and -combinatorial drug targets (e.g., Hedgehog signaling and CAMK-II) to inhibit EMT, which we validate using a 3D mammary duct-on-a-chip platform. Overall, this study provides a resource on TGFβ signaling and EMT.
Collapse
Affiliation(s)
- Indranil Paul
- Department of Biochemistry, Boston University School of Medicine, Boston University, 71 East Concord Street, Boston, MA, 02118, USA
| | - Dante Bolzan
- Department of Computer Science, University of Miami, 1356 Memorial Drive, Coral Gables, FL, 33146, USA
| | - Ahmed Youssef
- Graduate Program in Bioinformatics, Boston University, 24 Cummington Mall, Boston, MA, 02215, USA
| | - Keith A Gagnon
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, 02215, USA
| | - Heather Hook
- Department of Biology, Boston University, 24 Cummington Mall, Boston, MA, 02115, USA
- Biological Design Center, Boston University, 610 Commonwealth Avenue, Boston, MA, 02215, USA
| | - Gopal Karemore
- Advanced Analytics, Novo Nordisk A/S, 2760, Måløv, Denmark
| | - Michael U J Oliphant
- Cancer Research Institute, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, 02115, USA
| | - Weiwei Lin
- Department of Biochemistry, Boston University School of Medicine, Boston University, 71 East Concord Street, Boston, MA, 02118, USA
| | - Qian Liu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Manitoba, R3E 0J9, Canada
| | - Sadhna Phanse
- Department of Biochemistry, Boston University School of Medicine, Boston University, 71 East Concord Street, Boston, MA, 02118, USA
| | - Carl White
- Department of Biochemistry, Boston University School of Medicine, Boston University, 71 East Concord Street, Boston, MA, 02118, USA
| | - Dzmitry Padhorny
- Department of Applied Mathematics and Statistics, Stony Brook University, 11794, Stony Brook, NY, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Sergei Kotelnikov
- Department of Applied Mathematics and Statistics, Stony Brook University, 11794, Stony Brook, NY, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Christopher S Chen
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, 02215, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 3 Blackfan Circle, Boston, MA, 02115, USA
| | - Pingzhao Hu
- Department of Biochemistry, Western University, London, ON, N6A 5C1, Canada
| | - Gerald V Denis
- Boston Medical Center Cancer Center, Boston University, Boston University, 72 East Concord Street, Boston, MA, 02118, USA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, 11794, Stony Brook, NY, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Brian Raught
- Discovery Tower (TMDT), 101 College St, Rm. 9-701A, University of Toronto, Toronto, ON, M5G 1L7, Canada
| | - Trevor Siggers
- Department of Biology, Boston University, 24 Cummington Mall, Boston, MA, 02115, USA
- Biological Design Center, Boston University, 610 Commonwealth Avenue, Boston, MA, 02215, USA
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, 1356 Memorial Drive, Coral Gables, FL, 33146, USA
| | - Senthil K Muthuswamy
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Andrew Emili
- Department of Biochemistry, Boston University School of Medicine, Boston University, 71 East Concord Street, Boston, MA, 02118, USA.
- Department of Biology, Charles River Campus, Boston University, Life Science & Engineering (LSEB-602), 24 Cummington Mall, Boston, MA, 02215, USA.
- Division of Oncological Sciences, Knight Cancer Institute, Oregon Health and Science University, Portland, USA.
| |
Collapse
|
47
|
Schaudy E, Lietard J, Somoza MM. Enzymatic Synthesis of High-Density RNA Microarrays. Curr Protoc 2023; 3:e667. [PMID: 36794904 PMCID: PMC10946701 DOI: 10.1002/cpz1.667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
Oligonucleotide microarrays are used to investigate the interactome of nucleic acids. DNA microarrays are commercially available, whereas equivalent RNA microarrays are not. This protocol describes a method to convert DNA microarrays of any density and complexity into RNA microarrays using only readily available materials and reagents. This simple conversion protocol will facilitate the accessibility of RNA microarrays to a wide range of researchers. In addition to general considerations for the design of a template DNA microarray, this procedure describes the experimental steps of hybridization of an RNA primer to the immobilized DNA, followed by its covalent attachment via psoralen-mediated photocrosslinking. The subsequent enzymatic processing steps comprise the extension of the primer with T7 RNA polymerase to generate complementary RNA, and finally the removal of the DNA template with TURBO DNase. Beyond the conversion process, we also describe approaches to detect the RNA product either by internal labeling with fluorescently labeled NTPs or via hybridization to the product strand, a step that can then be complemented by an RNase H assay to confirm the nature of the product. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Conversion of a DNA microarray to an RNA microarray Alternate Protocol: Detection of RNA via incorporation of Cy3-UTP Support Protocol 1: Detection of RNA via hybridization Support Protocol 2: RNase H assay.
Collapse
Affiliation(s)
- Erika Schaudy
- Faculty of Chemistry, Institute of Inorganic ChemistryUniversity of ViennaJosef‐Holaubek‐Platz 2 (UZA 2)ViennaAustria
| | - Jory Lietard
- Faculty of Chemistry, Institute of Inorganic ChemistryUniversity of ViennaJosef‐Holaubek‐Platz 2 (UZA 2)ViennaAustria
| | - Mark M. Somoza
- Faculty of Chemistry, Institute of Inorganic ChemistryUniversity of ViennaJosef‐Holaubek‐Platz 2 (UZA 2)ViennaAustria
- Chair of Food Chemistry and Molecular Sensory ScienceTechnical University of MunichLise‐Meitner‐Straße 34FreisingGermany
- Leibniz Institute for Food Systems Biology at the Technical University of MunichLise‐Meitner‐Straße 30FreisingGermany
| |
Collapse
|
48
|
Abstract
The specificity in gene regulation is controlled by interactions between transcription factors (TFs) and genomic DNA regions such as promoters and enhancers. Enhanced yeast one-hybrid (eY1H) assays are among the methods used for high-throughput detection of transcription factor-DNA interactions. Here, we describe the procedure for screening interactions between DNA regions of interest ("DNA-baits") and an array of transcription factors ("TF-preys"), after DNA-bait and TF-prey yeast strains have been generated. Using a high-density array robotic platform, this method can be used to screen interactions between multiple DNA regions and >1000 TFs within a single experiment.
Collapse
Affiliation(s)
- Anna Berenson
- Department of Biology, Boston University, Boston, MA, USA
| | - Juan Ignacio Fuxman Bass
- Department of Biology, Boston University, Boston, MA, USA.
- Bioinformatics Program, Boston University, Boston, MA, USA.
| |
Collapse
|
49
|
Towards a better understanding of TF-DNA binding prediction from genomic features. Comput Biol Med 2022; 149:105993. [DOI: 10.1016/j.compbiomed.2022.105993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 07/12/2022] [Accepted: 08/14/2022] [Indexed: 11/17/2022]
|
50
|
Wetzel JL, Zhang K, Singh M. Learning probabilistic protein-DNA recognition codes from DNA-binding specificities using structural mappings. Genome Res 2022; 32:1776-1786. [PMID: 36123148 PMCID: PMC9528988 DOI: 10.1101/gr.276606.122] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 07/30/2022] [Indexed: 11/25/2022]
Abstract
Knowledge of how proteins interact with DNA is essential for understanding gene regulation. Although DNA-binding specificities for thousands of transcription factors (TFs) have been determined, the specific amino acid-base interactions comprising their structural interfaces are largely unknown. This lack of resolution hampers attempts to leverage these data in order to predict specificities for uncharacterized TFs or TFs mutated in disease. Here we introduce recognition code learning via automated mapping of protein-DNA structural interfaces (rCLAMPS), a probabilistic approach that uses DNA-binding specificities for TFs from the same structural family to simultaneously infer both which nucleotide positions are contacted by particular amino acids within the TF as well as a recognition code that relates each base-contacting amino acid to nucleotide preferences at the DNA positions it contacts. We apply rCLAMPS to homeodomains, the second largest family of TFs in metazoans and show that it learns a highly effective recognition code that can predict de novo DNA-binding specificities for TFs. Furthermore, we show that the inferred amino acid-nucleotide contacts reveal whether and how nucleotide preferences at individual binding site positions are altered by mutations within TFs. Our approach is an important step toward automatically uncovering the determinants of protein-DNA specificity from large compendia of DNA-binding specificities and inferring the altered functionalities of TFs mutated in disease.
Collapse
Affiliation(s)
- Joshua L Wetzel
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| | - Kaiqian Zhang
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| | - Mona Singh
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| |
Collapse
|