Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gosai SJ, Castro RI, Fuentes N, Butts JC, Kales S, Noche RR, Mouri K, Sabeti PC, Reilly SK, Tewhey R. Machine-guided design of synthetic cell type-specific cis-regulatory elements. bioRxiv 2023:2023.08.08.552077. [PMID: 37609287 PMCID: PMC10441439 DOI: 10.1101/2023.08.08.552077] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]

For:	Gosai SJ, Castro RI, Fuentes N, Butts JC, Kales S, Noche RR, Mouri K, Sabeti PC, Reilly SK, Tewhey R. Machine-guided design of synthetic cell type-specific cis-regulatory elements. bioRxiv 2023:2023.08.08.552077. [PMID: 37609287 PMCID: PMC10441439 DOI: 10.1101/2023.08.08.552077] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]

Number

Cited by Other Article(s)

Andreani V, South EJ, Dunlop MJ. Generating information-dense promoter sequences with optimal string packing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.01.565124. [PMID: 37961203 PMCID: PMC10635063 DOI: 10.1101/2023.11.01.565124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]

Abstract

Dense arrangements of binding sites within nucleotide sequences can collectively influence downstream transcription rates or initiate biomolecular interactions. For example, natural promoter regions can harbor many overlapping transcription factor binding sites that influence the rate of transcription initiation. Despite the prevalence of overlapping binding sites in nature, rapid design of nucleotide sequences with many overlapping sites remains a challenge. Here, we show that this is an NP-hard problem, coined here as the nucleotide String Packing Problem (SPP). We then introduce a computational technique that efficiently assembles sets of DNA-protein binding sites into dense, contiguous stretches of double-stranded DNA. For the efficient design of nucleotide sequences spanning hundreds of base pairs, we reduce the SPP to an Orienteering Problem with integer distances, and then leverage modern integer linear programming solvers. Our method optimally packs libraries of 20-100 binding sites into dense nucleotide arrays of 50-300 base pairs in 0.05-10 seconds. Unlike approximation algorithms or meta-heuristics, our approach finds provably optimal solutions. We demonstrate how our method can generate large sets of diverse sequences suitable for library generation, where the frequency of binding site usage across the returned sequences can be controlled by modulating the objective function. As an example, we then show how adding additional constraints, like the inclusion of sequence elements with fixed positions, allows for the design of bacterial promoters. The nucleotide string packing approach we present can accelerate the design of sequences with complex DNA-protein interactions. When used in combination with synthesis and high-throughput screening, this design strategy could help interrogate how complex binding site arrangements impact either gene expression or biomolecular mechanisms in varied cellular contexts.

Author Summary

The way protein binding sites are arranged on DNA can control the regulation and transcription of downstream genes. Areas with a high concentration of binding sites can enable complex interplay between transcription factors, a feature that is exploited by natural promoters. However, designing synthetic promoters that contain dense arrangements of binding sites is a challenge. The task involves overlapping many binding sites, each typically about 10 nucleotides long, within a constrained sequence area, which becomes increasingly difficult as sequence length decreases, and binding site variety increases. We introduce an approach to design nucleotide sequences with optimally packed protein binding sites, which we call the nucleotide String Packing Problem (SPP). We show that the SPP can be solved efficiently using integer linear programming to identify the densest arrangements of binding sites for a specified sequence length. We show how adding additional constraints, like the inclusion of sequence elements with fixed positions, allows for the design of bacterial promoters. The presented approach enables the rapid design and study of nucleotide sequences with complex, dense binding site architectures.

Collapse

DaSilva LF, Senan S, Patel ZM, Janardhan Reddy A, Gabbita S, Nussbaum Z, Valdez Córdova CM, Wenteler A, Weber N, Tunjic TM, Ahmad Khan T, Li Z, Smith C, Bejan M, Karmel Louis L, Cornejo P, Connell W, Wong ES, Meuleman W, Pinello L. DNA-Diffusion: Leveraging Generative Models for Controlling Chromatin Accessibility and Gene Expression via Synthetic Regulatory Elements. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.578352. [PMID: 38352499 PMCID: PMC10862870 DOI: 10.1101/2024.02.01.578352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/25/2024]

Affiliation(s)

Lucas Ferreira DaSilva Department of Pathology, Harvard Medical School, Boston, MA, USA Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
Simon Senan Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA Broad Institute of Harvard and MIT, Cambridge, MA, USA
Zain Munir Patel Department of Pathology, Harvard Medical School, Boston, MA, USA Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA Broad Institute of Harvard and MIT, Cambridge, MA, USA
Aniketh Janardhan Reddy Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
Sameer Gabbita Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA Johns Hopkins University, Baltimore, MD, USA
Zach Nussbaum Nomic AI
César Miguel Valdez Córdova Johannes Kepler University, Linz, Austria
Aaron Wenteler Queen Mary University of London, London, UK
Noah Weber TU Vienna, Austria
Tin M. Tunjic TU Vienna, Austria
Talha Ahmad Khan Independent Researcher
Zelun Li Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
Cameron Smith Department of Pathology, Harvard Medical School, Boston, MA, USA Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA Broad Institute of Harvard and MIT, Cambridge, MA, USA
Matei Bejan University of Bucharest, Bucharest, Romania
Lithin Karmel Louis Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
Paola Cornejo Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
Will Connell Independent Researcher
Emily S. Wong Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
Wouter Meuleman Altius Institute for Biomedical Sciences, Seattle, WA, USA Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
Luca Pinello Department of Pathology, Harvard Medical School, Boston, MA, USA Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA Broad Institute of Harvard and MIT, Cambridge, MA, USA

Collapse

de Almeida BP, Schaub C, Pagani M, Secchia S, Furlong EEM, Stark A. Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo. Nature 2024;626:207-211. [PMID: 38086418 PMCID: PMC10830412 DOI: 10.1038/s41586-023-06905-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 11/28/2023] [Indexed: 01/19/2024]

Martyn GE, Montgomery MT, Jones H, Guo K, Doughty BR, Linder J, Chen Z, Cochran K, Lawrence KA, Munson G, Pampari A, Fulco CP, Kelley DR, Lander ES, Kundaje A, Engreitz JM. Rewriting regulatory DNA to dissect and reprogram gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.20.572268. [PMID: 38187584 PMCID: PMC10769263 DOI: 10.1101/2023.12.20.572268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]

Abstract

Regulatory DNA sequences within enhancers and promoters bind transcription factors to encode cell type-specific patterns of gene expression. However, the regulatory effects and programmability of such DNA sequences remain difficult to map or predict because we have lacked scalable methods to precisely edit regulatory DNA and quantify the effects in an endogenous genomic context. Here we present an approach to measure the quantitative effects of hundreds of designed DNA sequence variants on gene expression, by combining pooled CRISPR prime editing with RNA fluorescence in situ hybridization and cell sorting (Variant-FlowFISH). We apply this method to mutagenize and rewrite regulatory DNA sequences in an enhancer and the promoter of PPIF in two immune cell lines. Of 672 variant-cell type pairs, we identify 497 that affect PPIF expression. These variants appear to act through a variety of mechanisms including disruption or optimization of existing transcription factor binding sites, as well as creation of de novo sites. Disrupting a single endogenous transcription factor binding site often led to large changes in expression (up to -40% in the enhancer, and -50% in the promoter). The same variant often had different effects across cell types and states, demonstrating a highly tunable regulatory landscape. We use these data to benchmark performance of sequence-based predictive models of gene regulation, and find that certain types of variants are not accurately predicted by existing models. Finally, we computationally design 185 small sequence variants (≤10 bp) and optimize them for specific effects on expression in silico. 84% of these rationally designed edits showed the intended direction of effect, and some had dramatic effects on expression (-100% to +202%). Variant-FlowFISH thus provides a powerful tool to map the effects of variants and transcription factor binding sites on gene expression, test and improve computational models of gene regulation, and reprogram regulatory DNA.

Collapse

Affiliation(s)

Gabriella E Martyn Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
Michael T Montgomery Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
Hank Jones Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
Katherine Guo Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
Benjamin R Doughty Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
Johannes Linder Calico Life Sciences, South San Francisco, CA, USA
Ziwei Chen Department of Computer Science, Stanford University, Stanford, CA, USA
Kelly Cochran Department of Computer Science, Stanford University, Stanford, CA, USA
Kathryn A Lawrence Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
Glen Munson The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Anusri Pampari Department of Computer Science, Stanford University, Stanford, CA, USA
Charles P Fulco Broad Institute of MIT and Harvard, Cambridge, MA, USA Present Address: Sanofi, Cambridge, MA, USA
David R Kelley Calico Life Sciences, South San Francisco, CA, USA
Eric S Lander Broad Institute of MIT and Harvard, Cambridge, MA, USA Department of Biology, MIT, Cambridge, MA, USA Department of Systems Biology, Harvard Medical School, Boston, MA, USA
Anshul Kundaje Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA Department of Computer Science, Stanford University, Stanford, CA, USA
Jesse M Engreitz Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA Stanford Cardiovascular Institute, Stanford University, Stanford, CA, USA

Collapse

Gjoni K, Pollard KS. SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.03.565556. [PMID: 37961123 PMCID: PMC10635135 DOI: 10.1101/2023.11.03.565556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]