1
|
Yang Y, Zhong Y, Chen L. EIciRNAs in focus: current understanding and future perspectives. RNA Biol 2025; 22:1-12. [PMID: 39711231 DOI: 10.1080/15476286.2024.2443876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 11/14/2024] [Accepted: 12/09/2024] [Indexed: 12/24/2024] Open
Abstract
Circular RNAs (circRNAs) are a unique class of covalently closed single-stranded RNA molecules that play diverse roles in normal physiology and pathology. Among the major types of circRNA, exon-intron circRNA (EIciRNA) distinguishes itself by its sequence composition and nuclear localization. Recent RNA-seq technologies and computational methods have facilitated the detection and characterization of EIciRNAs, with features like circRNA intron retention (CIR) and tissue-specificity being characterized. EIciRNAs have been identified to exert their functions via mechanisms such as regulating gene transcription, and the physiological relevance of EIciRNAs has been reported. Within this review, we present a summary of the current understanding of EIciRNAs, delving into their identification and molecular functions. Additionally, we emphasize factors regulating EIciRNA biogenesis and the physiological roles of EIciRNAs based on recent research. We also discuss the future challenges in EIciRNA exploration, underscoring the potential for novel functions and functional mechanisms of EIciRNAs for further investigation.
Collapse
Affiliation(s)
- Yan Yang
- Department of Cardiology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Science and Medicine, University of Science and Technology of China, Hefei, China
- Hefei National Laboratory for Physical Sciences at Microscale, University of Science and Technology of China, Hefei, China
| | - Yinchun Zhong
- Hefei National Laboratory for Physical Sciences at Microscale, University of Science and Technology of China, Hefei, China
- Department of Clinical Laboratory, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Science and Medicine, University of Science and Technology of China, Hefei, China
| | - Liang Chen
- Department of Cardiology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Science and Medicine, University of Science and Technology of China, Hefei, China
| |
Collapse
|
2
|
Asahina K, Zelikowsky M. Comparative Perspectives on Neuropeptide Function and Social Isolation. Biol Psychiatry 2025; 97:942-952. [PMID: 39892690 PMCID: PMC12048258 DOI: 10.1016/j.biopsych.2025.01.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Revised: 01/07/2025] [Accepted: 01/25/2025] [Indexed: 02/04/2025]
Abstract
Chronic social isolation alters behavior across animal species. Genetic model organisms such as mice and flies provide crucial insight into the molecular and physiological effects of social isolation on brain cells and circuits. Here, we comparatively review recent findings regarding the function of conserved neuropeptides in social isolation in mice and flies. Analogous functions of 3 classes of neuropeptides-tachykinins, cholecystokinins, and neuropeptide Y/F-in the two model organisms suggest that these molecules may be involved in modulating behavioral changes induced by social isolation across a wider range of species, including humans. Comparative approaches armed with tools to dissect neuropeptidergic function can lead to an integrated understanding of the impacts of social isolation on brain circuits and behavior.
Collapse
Affiliation(s)
- Kenta Asahina
- Molecular Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, California.
| | - Moriel Zelikowsky
- Department of Neurobiology, School of Medicine, The University of Utah, Salt Lake City, Utah
| |
Collapse
|
3
|
Martyn GE, Montgomery MT, Jones H, Guo K, Doughty BR, Linder J, Bisht D, Xia F, Cai XS, Chen Z, Cochran K, Lawrence KA, Munson G, Pampari A, Fulco CP, Sahni N, Kelley DR, Lander ES, Kundaje A, Engreitz JM. Rewriting regulatory DNA to dissect and reprogram gene expression. Cell 2025:S0092-8674(25)00352-6. [PMID: 40245860 DOI: 10.1016/j.cell.2025.03.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 12/16/2024] [Accepted: 03/19/2025] [Indexed: 04/19/2025]
Abstract
Regulatory DNA provides a platform for transcription factor binding to encode cell-type-specific patterns of gene expression. However, the effects and programmability of regulatory DNA sequences remain difficult to map or predict. Here, we develop variant effects from flow-sorting experiments with CRISPR targeting screens (Variant-EFFECTS) to introduce hundreds of designed edits to endogenous regulatory DNA and quantify their effects on gene expression. We systematically dissect and reprogram 3 regulatory elements for 2 genes in 2 cell types. These data reveal endogenous binding sites with effects specific to genomic context, transcription factor motifs with cell-type-specific activities, and limitations of computational models for predicting the effect sizes of variants. We identify small edits that can tune gene expression over a large dynamic range, suggesting new possibilities for prime-editing-based therapeutics targeting regulatory DNA. Variant-EFFECTS provides a generalizable tool to dissect regulatory DNA and to identify genome editing reagents that tune gene expression in an endogenous context.
Collapse
Affiliation(s)
- Gabriella E Martyn
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA; Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA 94305, USA
| | - Michael T Montgomery
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA; Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA 94305, USA
| | - Hank Jones
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA; Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA 94305, USA
| | - Katherine Guo
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA; Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA 94305, USA
| | - Benjamin R Doughty
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Johannes Linder
- Calico Life Sciences LLC, South San Francisco, CA 94080, USA
| | - Deepa Bisht
- Department of Genitourinary Medical Oncology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77230, USA
| | - Fan Xia
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA; Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA 94305, USA
| | - Xiangmeng S Cai
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA; Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA 94305, USA; Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Ziwei Chen
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Kelly Cochran
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Kathryn A Lawrence
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Glen Munson
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Anusri Pampari
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Charles P Fulco
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Nidhi Sahni
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77230, USA; Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77230, USA; Quantitative and Computational Biosciences Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - David R Kelley
- Calico Life Sciences LLC, South San Francisco, CA 94080, USA
| | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Biology, MIT, Cambridge, MA 02139, USA; Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Jesse M Engreitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA; Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA 94305, USA; Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Stanford Cardiovascular Institute, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
4
|
Johansen NJ, Kempynck N, Zemke NR, Somasundaram S, De Winter S, Hooper M, Dwivedi D, Lohia R, Wehbe F, Li B, Abaffyová D, Armand EJ, De Man J, Eksi EC, Hecker N, Hulselmans G, Konstantakos V, Mauduit D, Mich JK, Partel G, Daigle TL, Levi BP, Zhang K, Tanaka Y, Gillis J, Ting JT, Ben-Simon Y, Miller J, Ecker JR, Ren B, Aerts S, Lein ES, Tasic B, Bakken TE. Evaluating Methods for the Prediction of Cell Type-Specific Enhancers in the Mammalian Cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.08.21.609075. [PMID: 39229027 PMCID: PMC11370467 DOI: 10.1101/2024.08.21.609075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Identifying cell type-specific enhancers in the brain is critical to building genetic tools for investigating the mammalian brain. Computational methods for functional enhancer prediction have been proposed and validated in the fruit fly and not yet the mammalian brain. We organized the 'Brain Initiative Cell Census Network (BICCN) Challenge: Predicting Functional Cell Type-Specific Enhancers from Cross-Species Multi-Omics' to assess machine learning and feature-based methods designed to nominate enhancer DNA sequences to target cell types in the mouse cortex. Methods were evaluated based on in vivo validation data from hundreds of cortical cell type-specific enhancers that were previously packaged into individual AAV vectors and retro-orbitally injected into mice. We find that open chromatin was a key predictor of functional enhancers, and sequence models improved prediction of non-functional enhancers that can be deprioritized as opposed to pursued for in vivo testing. Sequence models also identified cell type-specific transcription factor codes that can guide designs of in silico enhancers. This community challenge establishes a benchmark for enhancer prioritization algorithms and reveals computational approaches and molecular information that are crucial for identifying functional enhancers in mammalian cortical cell types. The results of this challenge bring us closer to understanding the complex gene regulatory landscape of the mammalian cortex and to designing more efficient genetic tools to target cortical cell types.
Collapse
Affiliation(s)
- Nelson J Johansen
- Allen Institute for Brain Science, Seattle, WA 98109
- These authors contributed equally
| | - Niklas Kempynck
- VIB Center for AI & Computational Biology, VIB-KU Leuven Center for Brain and Disease Research & KU Leuven Department of Human Genetics, Leuven, Belgium
- These authors contributed equally
| | - Nathan R Zemke
- Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093
| | | | - Seppe De Winter
- VIB Center for AI & Computational Biology, VIB-KU Leuven Center for Brain and Disease Research & KU Leuven Department of Human Genetics, Leuven, Belgium
| | - Marcus Hooper
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Ruchi Lohia
- Physiology Department and Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Fabien Wehbe
- Maisonneuve-Rosemont Hospital Research Centre, University of Montreal, Montreal, Quebec, Canada
| | - Bocheng Li
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
| | - Darina Abaffyová
- VIB Center for AI & Computational Biology, VIB-KU Leuven Center for Brain and Disease Research & KU Leuven Department of Human Genetics, Leuven, Belgium
| | - Ethan J Armand
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093
| | - Julie De Man
- VIB Center for AI & Computational Biology, VIB-KU Leuven Center for Brain and Disease Research & KU Leuven Department of Human Genetics, Leuven, Belgium
| | - Eren Can Eksi
- VIB Center for AI & Computational Biology, VIB-KU Leuven Center for Brain and Disease Research & KU Leuven Department of Human Genetics, Leuven, Belgium
| | - Nikolai Hecker
- VIB Center for AI & Computational Biology, VIB-KU Leuven Center for Brain and Disease Research & KU Leuven Department of Human Genetics, Leuven, Belgium
| | - Gert Hulselmans
- VIB Center for AI & Computational Biology, VIB-KU Leuven Center for Brain and Disease Research & KU Leuven Department of Human Genetics, Leuven, Belgium
| | - Vasilis Konstantakos
- VIB Center for AI & Computational Biology, VIB-KU Leuven Center for Brain and Disease Research & KU Leuven Department of Human Genetics, Leuven, Belgium
| | - David Mauduit
- VIB Center for AI & Computational Biology, VIB-KU Leuven Center for Brain and Disease Research & KU Leuven Department of Human Genetics, Leuven, Belgium
| | - John K Mich
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Gabriele Partel
- VIB Center for AI & Computational Biology, VIB-KU Leuven Center for Brain and Disease Research & KU Leuven Department of Human Genetics, Leuven, Belgium
| | | | - Boaz P Levi
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Kai Zhang
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| | - Yoshiaki Tanaka
- Maisonneuve-Rosemont Hospital Research Centre, University of Montreal, Montreal, Quebec, Canada
| | - Jesse Gillis
- Physiology Department and Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Jonathan T Ting
- Allen Institute for Brain Science, Seattle, WA 98109
- Department of Physiology and Biophysics, University of Washington, Seattle, WA 98195
| | | | - Jeremy Miller
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Joseph R Ecker
- Salk Institute for Biological Studies, La Jolla, CA 92037
| | - Bing Ren
- Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093
| | - Stein Aerts
- VIB Center for AI & Computational Biology, VIB-KU Leuven Center for Brain and Disease Research & KU Leuven Department of Human Genetics, Leuven, Belgium
| | - Ed S Lein
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Trygve E Bakken
- Allen Institute for Brain Science, Seattle, WA 98109
- Lead contact
| |
Collapse
|
5
|
Capitanchik C, Wilkins OG, Wagner N, Gagneur J, Ule J. From computational models of the splicing code to regulatory mechanisms and therapeutic implications. Nat Rev Genet 2025; 26:171-190. [PMID: 39358547 DOI: 10.1038/s41576-024-00774-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/27/2024] [Indexed: 10/04/2024]
Abstract
Since the discovery of RNA splicing and its role in gene expression, researchers have sought a set of rules, an algorithm or a computational model that could predict the splice isoforms, and their frequencies, produced from any transcribed gene in a specific cellular context. Over the past 30 years, these models have evolved from simple position weight matrices to deep-learning models capable of integrating sequence data across vast genomic distances. Most recently, new model architectures are moving the field closer to context-specific alternative splicing predictions, and advances in sequencing technologies are expanding the type of data that can be used to inform and interpret such models. Together, these developments are driving improved understanding of splicing regulatory mechanisms and emerging applications of the splicing code to the rational design of RNA- and splicing-based therapeutics.
Collapse
Affiliation(s)
- Charlotte Capitanchik
- The Francis Crick Institute, London, UK
- UK Dementia Research Institute at King's College London, London, UK
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK
| | - Oscar G Wilkins
- The Francis Crick Institute, London, UK
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Nils Wagner
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| | - Jernej Ule
- The Francis Crick Institute, London, UK.
- UK Dementia Research Institute at King's College London, London, UK.
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK.
- National Institute of Chemistry, Ljubljana, Slovenia.
| |
Collapse
|
6
|
Wang S, Xiao L. Progress in AAV-Mediated In Vivo Gene Therapy and Its Applications in Central Nervous System Diseases. Int J Mol Sci 2025; 26:2213. [PMID: 40076831 PMCID: PMC11899905 DOI: 10.3390/ijms26052213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2025] [Revised: 02/25/2025] [Accepted: 02/27/2025] [Indexed: 03/14/2025] Open
Abstract
As the blood-brain barrier (BBB) prevents molecules from accessing the central nervous system (CNS), the traditional systemic delivery of chemical drugs limits the development of neurological drugs. However, in recent years, innovative therapeutic strategies have tried to bypass the restriction of traditional drug delivery methods. In vivo gene therapy refers to emerging biopharma vectors that carry the specific genes and target and infect specific tissues; these infected cells and tissues then undergo fundamental changes at the genetic level and produce therapeutic proteins or substances, thus providing therapeutic benefits. Clinical and preclinical trials mainly utilize adeno-associated viruses (AAVs), lentiviruses (LVs), and other viruses as gene vectors for disease investigation. Although LVs have a higher gene-carrying capacity, the vector of choice for many neurological diseases is the AAV vector due to its safety and long-term transgene expression in neurons. Here, we review the basic biology of AAVs and summarize some key issues in recombinant AAV (rAAV) engineering in gene therapy research; then, we summarize recent clinical trials using rAAV treatment for neurological diseases and provide translational perspectives and future challenges on target selection.
Collapse
Affiliation(s)
- Shuming Wang
- Institute for Brain Research and Rehabilitation, Guangdong Key Laboratory of Mental Health and Cognitive Science, Center for Studies of Psychological Application, South China Normal University, Guangzhou 510631, China;
- Key Laboratory of Brain, Cognition and Education Sciences of Ministry of Education, South China Normal University, Guangzhou 510631, China
| | - Lin Xiao
- Institute for Brain Research and Rehabilitation, Guangdong Key Laboratory of Mental Health and Cognitive Science, Center for Studies of Psychological Application, South China Normal University, Guangzhou 510631, China;
- Key Laboratory of Brain, Cognition and Education Sciences of Ministry of Education, South China Normal University, Guangzhou 510631, China
| |
Collapse
|
7
|
He AY, Palamuttam NP, Danko CG. Training deep learning models on personalized genomic sequences improves variant effect prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.10.15.618510. [PMID: 39463940 PMCID: PMC11507713 DOI: 10.1101/2024.10.15.618510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Sequence-to-function models have broad applications in interpreting the molecular impact of genetic variation, yet have been criticized for poor performance in this task. Here we show that training models on functional genomic data with matched personal genomes improves their performance at variant effect prediction. Variant effect representations are retained even when fine tuning models to unseen cellular contexts and experimental readouts. Our results have implications for interpreting trait-associated genetic variation.
Collapse
|
8
|
Murphy AE, Askarova A, Lenhard B, Skene NG, Marzi S. Predicting gene expression from histone marks using chromatin deep learning models depends on histone mark function, regulatory distance and cellular states. Nucleic Acids Res 2025; 53:gkae1212. [PMID: 39660643 PMCID: PMC11879020 DOI: 10.1093/nar/gkae1212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 10/12/2024] [Accepted: 12/09/2024] [Indexed: 12/12/2024] Open
Abstract
To understand the complex relationship between histone mark activity and gene expression, recent advances have used in silico predictions based on large-scale machine learning models. However, these approaches have omitted key contributing factors like cell state, histone mark function or distal effects, which impact the relationship, limiting their findings. Moreover, downstream use of these models for new biological insight is lacking. Here, we present the most comprehensive study of this relationship to date - investigating seven histone marks in eleven cell types across a diverse range of cell states. We used convolutional and attention-based models to predict transcription from histone mark activity at promoters and distal regulatory elements. Our work shows that histone mark function, genomic distance and cellular states collectively influence a histone mark's relationship with transcription. We found that no individual histone mark is consistently the strongest predictor of gene expression across all genomic and cellular contexts. This highlights the need to consider all three factors when determining the effect of histone mark activity on transcriptional state. Furthermore, we conducted in silico histone mark perturbation assays, uncovering functional and disease related loci and highlighting frameworks for the use of chromatin deep learning models to uncover new biological insight.
Collapse
Affiliation(s)
- Alan E Murphy
- UK Dementia Research Institute at Imperial College London, 86 Wood Lane, London W12 0BZ, UK
- Department of Brain Sciences, Imperial College London, 86 Wood Lane, London W12 0BZ, UK
| | - Aydan Askarova
- UK Dementia Research Institute at Imperial College London, 86 Wood Lane, London W12 0BZ, UK
- Department of Brain Sciences, Imperial College London, 86 Wood Lane, London W12 0BZ, UK
| | - Boris Lenhard
- MRC London Institute of Medical Sciences, Imperial College London, Du Cane Road, London W12 0HS, UK
| | - Nathan G Skene
- UK Dementia Research Institute at Imperial College London, 86 Wood Lane, London W12 0BZ, UK
- Department of Brain Sciences, Imperial College London, 86 Wood Lane, London W12 0BZ, UK
| | - Sarah J Marzi
- Department of Brain Sciences, Imperial College London, 86 Wood Lane, London W12 0BZ, UK
- UK Dementia Research Institute at King’s College London, 338 Euston Road, London SE5 9RT, UK
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King’s College London, 16 De Crespigny Park, London SE5 9RT, UK
| |
Collapse
|
9
|
Li J, Zhang P, Xi X, Liu L, Wei L, Wang X. Modeling and designing enhancers by introducing and harnessing transcription factor binding units. Nat Commun 2025; 16:1469. [PMID: 39922842 PMCID: PMC11807178 DOI: 10.1038/s41467-025-56749-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Accepted: 01/24/2025] [Indexed: 02/10/2025] Open
Abstract
Enhancers serve as pivotal regulators of gene expression throughout various biological processes by interacting with transcription factors (TFs). While transcription factor binding sites (TFBSs) are widely acknowledged as key determinants of TF binding and enhancer activity, the significant role of their surrounding context sequences remains to be quantitatively characterized. Here we propose the concept of transcription factor binding unit (TFBU) to modularly model enhancers by quantifying the impact of context sequences surrounding TFBSs using deep learning models. Based on this concept, we develop DeepTFBU, a comprehensive toolkit for enhancer design. We demonstrate that designing TFBS context sequences can significantly modulate enhancer activities and produce cell type-specific responses. DeepTFBU is also highly efficient in the de novo design of enhancers containing multiple TFBSs. Furthermore, DeepTFBU enables flexible decoupling and optimization of generalized enhancers. We prove that TFBU is a crucial concept, and DeepTFBU is highly effective for rational enhancer design.
Collapse
Affiliation(s)
- Jiaqi Li
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China
| | - Pengcheng Zhang
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China
| | - Xi Xi
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China
| | - Liyang Liu
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China
| | - Lei Wei
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China.
| |
Collapse
|
10
|
Vriend J, Delwel R, Pastoors D. Mechanisms of enhancer-driven oncogene activation. Int J Cancer 2025. [PMID: 39853740 DOI: 10.1002/ijc.35330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 12/23/2024] [Accepted: 01/07/2025] [Indexed: 01/26/2025]
Abstract
An aggressive subtype of acute myeloid leukemia (AML) is caused by enhancer hijacking resulting in MECOM overexpression. Several chromosomal rearrangements can lead to this: the most common (inv(3)/t(3;3)) results in a hijacked GATA2 enhancer, and there are several atypical MECOM rearrangements involving enhancers from other hematopoietic genes. The set of enhancers which can be hijacked by MECOM can also be hijacked by BCL11B. Enhancer deregulation is also a driver of oncogenesis in a range of other malignancies. The mechanisms of enhancer deregulation observed in other cancer types, including TAD boundary disruptions and the creation of de novo (super-) enhancers, may explain overexpression of MECOM or other oncogenes in AML without enhancer hijacking upon translocation. Gaining mechanistic insight in both enhancer deregulation and super-enhancer activity is critical to pave the way for new treatments for AML and other cancers that are the result of enhancer deregulation.
Collapse
Affiliation(s)
- Joyce Vriend
- Department of Hematology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Ruud Delwel
- Department of Hematology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Dorien Pastoors
- Department of Hematology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| |
Collapse
|
11
|
Shen Y, Kudla G, Oyarzún DA. Improving the generalization of protein expression models with mechanistic sequence information. Nucleic Acids Res 2025; 53:gkaf020. [PMID: 39873269 PMCID: PMC11773361 DOI: 10.1093/nar/gkaf020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 12/12/2024] [Accepted: 01/08/2025] [Indexed: 01/30/2025] Open
Abstract
The growing demand for biological products drives many efforts to maximize expression of heterologous proteins. Advances in high-throughput sequencing can produce data suitable for building sequence-to-expression models with machine learning. The most accurate models have been trained on one-hot encodings, a mechanism-agnostic representation of nucleotide sequences. Moreover, studies have consistently shown that training on mechanistic sequence features leads to much poorer predictions, even with features that are known to correlate with expression, such as DNA sequence motifs, codon usage, or properties of mRNA secondary structures. However, despite their excellent local accuracy, current sequence-to-expression models can fail to generalize predictions far away from the training data. Through a comparative study across datasets in Escherichia coli and Saccharomyces cerevisiae, here we show that mechanistic sequence features can provide gains on model generalization, and thus improve their utility for predictive sequence design. We explore several strategies to integrate one-hot encodings and mechanistic features into a single predictive model, including feature stacking, ensemble model stacking, and geometric stacking, a novel architecture based on graph convolutional neural networks. Our work casts new light on mechanistic sequence features, underscoring the importance of domain-knowledge and feature engineering for accurate prediction of protein expression levels.
Collapse
Affiliation(s)
- Yuxin Shen
- School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JH, United Kingdom
| | - Grzegorz Kudla
- Institute for Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Diego A Oyarzún
- School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JH, United Kingdom
- School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, United Kingdom
| |
Collapse
|
12
|
Catta-Preta R, Lindtner S, Ypsilanti A, Seban N, Price JD, Abnousi A, Su-Feher L, Wang Y, Cichewicz K, Boerma SA, Juric I, Jones IR, Akiyama JA, Hu M, Shen Y, Visel A, Pennacchio LA, Dickel DE, Rubenstein JLR, Nord AS. Combinatorial transcription factor binding encodes cis-regulatory wiring of mouse forebrain GABAergic neurogenesis. Dev Cell 2025; 60:288-304.e6. [PMID: 39481376 PMCID: PMC11753952 DOI: 10.1016/j.devcel.2024.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 06/17/2024] [Accepted: 10/03/2024] [Indexed: 11/02/2024]
Abstract
Transcription factors (TFs) bind combinatorially to cis-regulatory elements, orchestrating transcriptional programs. Although studies of chromatin state and chromosomal interactions have demonstrated dynamic neurodevelopmental cis-regulatory landscapes, parallel understanding of TF interactions lags. To elucidate combinatorial TF binding driving mouse basal ganglia development, we integrated chromatin immunoprecipitation sequencing (ChIP-seq) for twelve TFs, H3K4me3-associated enhancer-promoter interactions, chromatin and gene expression data, and functional enhancer assays. We identified sets of putative regulatory elements with shared TF binding (TF-pRE modules) that orchestrate distinct processes of GABAergic neurogenesis and suppress other cell fates. The majority of pREs were bound by one or two TFs; however, a small proportion were extensively bound. These sequences had exceptional evolutionary conservation and motif density, complex chromosomal interactions, and activity as in vivo enhancers. Our results provide insights into the combinatorial TF-pRE interactions that activate and repress expression programs during telencephalon neurogenesis and demonstrate the value of TF binding toward modeling developmental transcriptional wiring.
Collapse
Affiliation(s)
- Rinaldo Catta-Preta
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA
| | - Susan Lindtner
- Nina Ireland Laboratory of Developmental Neurobiology, Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Athena Ypsilanti
- Nina Ireland Laboratory of Developmental Neurobiology, Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Nicolas Seban
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA
| | - James D Price
- Nina Ireland Laboratory of Developmental Neurobiology, Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Armen Abnousi
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH 44106, USA
| | - Linda Su-Feher
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA
| | - Yurong Wang
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA
| | - Karol Cichewicz
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA
| | - Sally A Boerma
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA
| | - Ivan Juric
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH 44106, USA
| | - Ian R Jones
- Institute for Human Genetics, Department of Neurology, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Jennifer A Akiyama
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH 44106, USA
| | - Yin Shen
- Institute for Human Genetics, Department of Neurology, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Axel Visel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; U.S. Department of Energy Joint Genome Institute, Walnut Creek, CA 94598, USA; School of Natural Sciences, University of California, Merced, Merced, CA 95343, USA
| | - Len A Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; U.S. Department of Energy Joint Genome Institute, Walnut Creek, CA 94598, USA; Comparative Biochemistry Program, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Diane E Dickel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - John L R Rubenstein
- Nina Ireland Laboratory of Developmental Neurobiology, Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA.
| | - Alex S Nord
- Department of Neurobiology, Physiology and Behavior, and Department of Psychiatry and Behavioral Sciences, University of California, Davis, Davis, CA 95618, USA.
| |
Collapse
|
13
|
Song W, Ovcharenko I. Abundant repressor binding sites in human enhancers are associated with the fine-tuning of gene regulation. iScience 2025; 28:111658. [PMID: 39868043 PMCID: PMC11761325 DOI: 10.1016/j.isci.2024.111658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 08/04/2024] [Accepted: 11/25/2024] [Indexed: 01/28/2025] Open
Abstract
The regulation of gene expression relies on the coordinated action of transcription factors (TFs) at enhancers, including both activator and repressor TFs. We employed deep learning (DL) to dissect HepG2 enhancers into positive (PAR), negative (NAR), and neutral activity regions. Sharpr-MPRA and STARR-seq highlight the dichotomy impact of NARs and PARs on modulating and catalyzing the activity of enhancers, respectively. Approximately 22% of HepG2 enhancers, termed "repressive impact enhancers" (RIEs), are predominantly populated by NARs and transcriptional repression motifs. Genes flanking RIEs exhibit a stage-specific decline in expression during late development, suggesting RIEs' role in trimming enhancer activities. About 16.7% of human NARs emerge from neutral rhesus macaque DNA. This gain of repressor binding sites in RIEs is associated with a 30% decrease in the average expression of flanking genes in humans compared to rhesus macaque. Our work reveals modulated enhancer activity and adaptable gene regulation through the evolutionary dynamics of TF binding sites.
Collapse
Affiliation(s)
- Wei Song
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
14
|
Friedman RZ, Ramu A, Lichtarge S, Wu Y, Tripp L, Lyon D, Myers CA, Granas DM, Gause M, Corbo JC, Cohen BA, White MA. Active learning of enhancers and silencers in the developing neural retina. Cell Syst 2025; 16:101163. [PMID: 39778579 PMCID: PMC11827711 DOI: 10.1016/j.cels.2024.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 10/17/2024] [Accepted: 12/06/2024] [Indexed: 01/11/2025]
Abstract
Deep learning is a promising strategy for modeling cis-regulatory elements. However, models trained on genomic sequences often fail to explain why the same transcription factor can activate or repress transcription in different contexts. To address this limitation, we developed an active learning approach to train models that distinguish between enhancers and silencers composed of binding sites for the photoreceptor transcription factor cone-rod homeobox (CRX). After training the model on nearly all bound CRX sites from the genome, we coupled synthetic biology with uncertainty sampling to generate additional rounds of informative training data. This allowed us to iteratively train models on data from multiple rounds of massively parallel reporter assays. The ability of the resulting models to discriminate between CRX sites with identical sequence but opposite functions establishes active learning as an effective strategy to train models of regulatory DNA. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Ryan Z Friedman
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Avinash Ramu
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Sara Lichtarge
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Yawei Wu
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Lloyd Tripp
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Daniel Lyon
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Connie A Myers
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - David M Granas
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Maria Gause
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Joseph C Corbo
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Barak A Cohen
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA
| | - Michael A White
- The Edison Family Center for Genome Sciences & Systems Biology, Saint Louis, MO 63110, USA; Department of Genetics, Saint Louis, MO 63110, USA.
| |
Collapse
|
15
|
Pampari A, Shcherbina A, Kvon EZ, Kosicki M, Nair S, Kundu S, Kathiria AS, Risca VI, Kuningas K, Alasoo K, Greenleaf WJ, Pennacchio LA, Kundaje A. ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.12.25.630221. [PMID: 39829783 PMCID: PMC11741299 DOI: 10.1101/2024.12.25.630221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Despite extensive mapping of cis-regulatory elements (cREs) across cellular contexts with chromatin accessibility assays, the sequence syntax and genetic variants that regulate transcription factor (TF) binding and chromatin accessibility at context-specific cREs remain elusive. We introduce ChromBPNet, a deep learning DNA sequence model of base-resolution accessibility profiles that detects, learns and deconvolves assay-specific enzyme biases from regulatory sequence determinants of accessibility, enabling robust discovery of compact TF motif lexicons, cooperative motif syntax and precision footprints across assays and sequencing depths. Extensive benchmarks show that ChromBPNet, despite its lightweight design, is competitive with much larger contemporary models at predicting variant effects on chromatin accessibility, pioneer TF binding and reporter activity across assays, cell contexts and ancestry, while providing interpretation of disrupted regulatory syntax. ChromBPNet also helps prioritize and interpret regulatory variants that influence complex traits and rare diseases, thereby providing a powerful lens to decode regulatory DNA and genetic variation.
Collapse
Affiliation(s)
- Anusri Pampari
- Department of Computer Science, Stanford University, Stanford CA, 94305
| | - Anna Shcherbina
- Department of Biomedical Data Sciences, Stanford University, Stanford CA, 94305
| | - Evgeny Z. Kvon
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA
| | - Michael Kosicki
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Surag Nair
- Department of Computer Science, Stanford University, Stanford CA, 94305
| | - Soumya Kundu
- Department of Computer Science, Stanford University, Stanford CA, 94305
| | | | | | | | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - William James Greenleaf
- Department of Genetics, Stanford University, Stanford CA, 94305
- Department of Applied Physics, Stanford University, Stanford, California 94305, USA
| | - Len A. Pennacchio
- Environmental Genomics & System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford CA, 94305
- Department of Genetics, Stanford University, Stanford CA, 94305
| |
Collapse
|
16
|
Hecker N, Kempynck N, Mauduit D, Abaffyová D, Vandepoel R, Dieltiens S, Borm L, Sarropoulos I, González-Blas CB, De Man J, Davie K, Leysen E, Vandensteen J, Moors R, Hulselmans G, Lim L, De Wit J, Christiaens V, Poovathingal S, Aerts S. Enhancer-driven cell type comparison reveals similarities between the mammalian and bird pallium. Science 2025; 387:eadp3957. [PMID: 39946451 DOI: 10.1126/science.adp3957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 11/26/2024] [Indexed: 04/23/2025]
Abstract
Combinations of transcription factors govern the identity of cell types, which is reflected by genomic enhancer codes. We used deep learning to characterize these enhancer codes and devised three metrics to compare cell types in the telencephalon across amniotes. To this end, we generated single-cell multiome and spatially resolved transcriptomics data of the chicken telencephalon. Enhancer codes of orthologous nonneuronal and γ-aminobutyric acid-mediated (GABAergic) cell types show a high degree of similarity across amniotes, whereas excitatory neurons of the mammalian neocortex and avian pallium exhibit varying degrees of similarity. Enhancer codes of avian mesopallial neurons are most similar to those of mammalian deep-layer neurons. With this study, we present generally applicable deep learning approaches to characterize and compare cell types on the basis of genomic regulatory sequences.
Collapse
Affiliation(s)
- Nikolai Hecker
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Niklas Kempynck
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - David Mauduit
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Darina Abaffyová
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Roel Vandepoel
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Sam Dieltiens
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Lars Borm
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Ioannis Sarropoulos
- Center for Molecular Biology of Heidelberg University, Heidelberg University, Heidelberg, Germany
| | - Carmen Bravo González-Blas
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Julie De Man
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Kristofer Davie
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Elke Leysen
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Jeroen Vandensteen
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Rani Moors
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Gert Hulselmans
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Lynette Lim
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Joris De Wit
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Valerie Christiaens
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | | | - Stein Aerts
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology, Leuven, Belgium
- VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| |
Collapse
|
17
|
Kaplan HS, Horvath PM, Rahman MM, Dulac C. The neurobiology of parenting and infant-evoked aggression. Physiol Rev 2025; 105:315-381. [PMID: 39146250 DOI: 10.1152/physrev.00036.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 07/19/2024] [Accepted: 08/09/2024] [Indexed: 08/17/2024] Open
Abstract
Parenting behavior comprises a variety of adult-infant and adult-adult interactions across multiple timescales. The state transition from nonparent to parent requires an extensive reorganization of individual priorities and physiology and is facilitated by combinatorial hormone action on specific cell types that are integrated throughout interconnected and brainwide neuronal circuits. In this review, we take a comprehensive approach to integrate historical and current literature on each of these topics across multiple species, with a focus on rodents. New and emerging molecular, circuit-based, and computational technologies have recently been used to address outstanding gaps in our current framework of knowledge on infant-directed behavior. This work is raising fundamental questions about the interplay between instinctive and learned components of parenting and the mutual regulation of affiliative versus agonistic infant-directed behaviors in health and disease. Whenever possible, we point to how these technologies have helped gain novel insights and opened new avenues of research into the neurobiology of parenting. We hope this review will serve as an introduction for those new to the field, a comprehensive resource for those already studying parenting, and a guidepost for designing future studies.
Collapse
Affiliation(s)
- Harris S Kaplan
- Department of Molecular and Cellular Biology, Howard Hughes Medical Institute, Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States
| | - Patricia M Horvath
- Department of Molecular and Cellular Biology, Howard Hughes Medical Institute, Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States
| | - Mohammed Mostafizur Rahman
- Department of Molecular and Cellular Biology, Howard Hughes Medical Institute, Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States
| | - Catherine Dulac
- Department of Molecular and Cellular Biology, Howard Hughes Medical Institute, Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States
| |
Collapse
|
18
|
Shireen H, Batool F, Khatoon H, Parveen N, Sehar NU, Hussain I, Ali S, Abbasi AA. Predicting genome-wide tissue-specific enhancers via combinatorial transcription factor genomic occupancy analysis. FEBS Lett 2025; 599:100-119. [PMID: 39367524 DOI: 10.1002/1873-3468.15030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 08/27/2024] [Accepted: 09/13/2024] [Indexed: 10/06/2024]
Abstract
Enhancers are non-coding cis-regulatory elements crucial for transcriptional regulation. Mutations in enhancers can disrupt gene regulation, leading to disease phenotypes. Identifying enhancers and their tissue-specific activity is challenging due to their lack of stereotyped sequences. This study presents a sequence-based computational model that uses combinatorial transcription factor (TF) genomic occupancy to predict tissue-specific enhancers. Trained on diverse datasets, including ENCODE and Vista enhancer browser data, the model predicted 25 000 forebrain-specific cis-regulatory modules (CRMs) in the human genome. Validation using biochemical features, disease-associated SNPs, and in vivo zebrafish analysis confirmed its effectiveness. This model aids in predicting enhancers lacking well-characterized chromatin features, complementing experimental approaches in tissue-specific enhancer discovery.
Collapse
Affiliation(s)
- Huma Shireen
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Fatima Batool
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Hizran Khatoon
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Nazia Parveen
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Noor Us Sehar
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Irfan Hussain
- Centre for Regenerative Medicine and Stem Cells Research, Agha Khan University hospital, Karachi, Pakistan
| | - Shahid Ali
- Department of Organismal Biology and Anatomy, The University of Chicago, Chicago, IL, USA
| | - Amir Ali Abbasi
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| |
Collapse
|
19
|
Bongrand P. Should Artificial Intelligence Play a Durable Role in Biomedical Research and Practice? Int J Mol Sci 2024; 25:13371. [PMID: 39769135 PMCID: PMC11676049 DOI: 10.3390/ijms252413371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 11/26/2024] [Accepted: 12/09/2024] [Indexed: 01/11/2025] Open
Abstract
During the last decade, artificial intelligence (AI) was applied to nearly all domains of human activity, including scientific research. It is thus warranted to ask whether AI thinking should be durably involved in biomedical research. This problem was addressed by examining three complementary questions (i) What are the major barriers currently met by biomedical investigators? It is suggested that during the last 2 decades there was a shift towards a growing need to elucidate complex systems, and that this was not sufficiently fulfilled by previously successful methods such as theoretical modeling or computer simulation (ii) What is the potential of AI to meet the aforementioned need? it is suggested that recent AI methods are well-suited to perform classification and prediction tasks on multivariate systems, and possibly help in data interpretation, provided their efficiency is properly validated. (iii) Recent representative results obtained with machine learning suggest that AI efficiency may be comparable to that displayed by human operators. It is concluded that AI should durably play an important role in biomedical practice. Also, as already suggested in other scientific domains such as physics, combining AI with conventional methods might generate further progress and new applications, involving heuristic and data interpretation.
Collapse
Affiliation(s)
- Pierre Bongrand
- Laboratory Adhesion and Inflammation (LAI), Inserm UMR 1067, Cnrs Umr 7333, Aix-Marseille Université UM 61, 13009 Marseille, France
| |
Collapse
|
20
|
Baniulyte G, McCann AA, Woodstock DL, Sammons MA. Crosstalk between paralogs and isoforms influences p63-dependent regulatory element activity. Nucleic Acids Res 2024; 52:13812-13831. [PMID: 39565223 DOI: 10.1093/nar/gkae1143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 10/04/2024] [Accepted: 11/01/2024] [Indexed: 11/21/2024] Open
Abstract
The p53 family of transcription factors (p53, p63 and p73) regulate diverse organismal processes including tumor suppression, maintenance of genome integrity and the development of skin and limbs. Crosstalk between transcription factors with highly similar DNA binding profiles, like those in the p53 family, can dramatically alter gene regulation. While p53 is primarily associated with transcriptional activation, p63 mediates both activation and repression. The specific mechanisms controlling p63-dependent gene regulatory activity are not well understood. Here, we use massively parallel reporter assays (MPRA) to investigate how local DNA sequence context influences p63-dependent transcriptional activity. Most regulatory elements with a p63 response element motif (p63RE) activate transcription, although binding of the p63 paralog, p53, drives a substantial proportion of that activity. p63RE sequence content and co-enrichment with other known activating and repressing transcription factors, including lineage-specific factors, correlates with differential p63RE-mediated activities. p63 isoforms dramatically alter transcriptional behavior, primarily shifting inactive regulatory elements towards high p63-dependent activity. Our analysis provides novel insight into how local sequence and cellular context influences p63-dependent behaviors and highlights the key, yet still understudied, role of transcription factor paralogs and isoforms in controlling gene regulatory element activity.
Collapse
Affiliation(s)
- Gabriele Baniulyte
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, 1400 Washington Ave, Albany, NY 12222, USA
| | - Abby A McCann
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, 1400 Washington Ave, Albany, NY 12222, USA
| | - Dana L Woodstock
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, 1400 Washington Ave, Albany, NY 12222, USA
| | - Morgan A Sammons
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York, 1400 Washington Ave, Albany, NY 12222, USA
| |
Collapse
|
21
|
Li Z, Zhang Y, Peng B, Qin S, Zhang Q, Chen Y, Chen C, Bao Y, Zhu Y, Hong Y, Liu B, Liu Q, Xu L, Chen X, Ma X, Wang H, Xie L, Yao Y, Deng B, Li J, De B, Chen Y, Wang J, Li T, Liu R, Tang Z, Cao J, Zuo E, Mei C, Zhu F, Shao C, Wang G, Sun T, Wang N, Liu G, Ni JQ, Liu Y. A novel interpretable deep learning-based computational framework designed synthetic enhancers with broad cross-species activity. Nucleic Acids Res 2024; 52:13447-13468. [PMID: 39420601 PMCID: PMC11602155 DOI: 10.1093/nar/gkae912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 09/25/2024] [Accepted: 10/03/2024] [Indexed: 10/19/2024] Open
Abstract
Enhancers play a critical role in dynamically regulating spatial-temporal gene expression and establishing cell identity, underscoring the significance of designing them with specific properties for applications in biosynthetic engineering and gene therapy. Despite numerous high-throughput methods facilitating genome-wide enhancer identification, deciphering the sequence determinants of their activity remains challenging. Here, we present the DREAM (DNA cis-Regulatory Elements with controllable Activity design platforM) framework, a novel deep learning-based approach for synthetic enhancer design. Proficient in uncovering subtle and intricate patterns within extensive enhancer screening data, DREAM achieves cutting-edge sequence-based enhancer activity prediction and highlights critical sequence features implicating strong enhancer activity. Leveraging DREAM, we have engineered enhancers that surpass the potency of the strongest enhancer within the Drosophila genome by approximately 3.6-fold. Remarkably, these synthetic enhancers exhibited conserved functionality across species that have diverged more than billion years, indicating that DREAM was able to learn highly conserved enhancer regulatory grammar. Additionally, we designed silencers and cell line-specific enhancers using DREAM, demonstrating its versatility. Overall, our study not only introduces an interpretable approach for enhancer design but also lays out a general framework applicable to the design of other types of cis-regulatory elements.
Collapse
Affiliation(s)
- Zhaohong Li
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Yuanyuan Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Bo Peng
- Gene Regulatory Lab, School of Basic Medical Sciences, Tsinghua University, NO. 30 Shuangqing road, Haidian district, Beijing 100084, China
- State Key Laboratory of Molecular Oncology, Tsinghua University, NO. 30 Shuangqing road, Haidian district, Beijing 100084, China
| | - Shenghua Qin
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Qian Zhang
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, NO.1 Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Yun Chen
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Choulin Chen
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Yongzhou Bao
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Yuqi Zhu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, NO. 7 Pengfei Road, Dapeng District, Shenzhen 518124, China
| | - Yi Hong
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, NO. 7 Pengfei Road, Dapeng District, Shenzhen 518124, China
| | - Binghua Liu
- State Key Laboratory of Maricultural Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, NO.106 Nanjing Road, Shinan District, Qingdao, Shandong 266071, China
| | - Qian Liu
- State Key Laboratory of Maricultural Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, NO.106 Nanjing Road, Shinan District, Qingdao, Shandong 266071, China
| | - Lingna Xu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Xi Chen
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Xinhao Ma
- College of Grassland Agriculture, National Beef Cattle Improvement Center, College of Animal Science and Technology, Northwest A&F University, NO. 3 Taicheng Road, Yangling District, Yangling, Shaanxi 712100, China
| | - Hongyan Wang
- State Key Laboratory of Maricultural Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, NO.106 Nanjing Road, Shinan District, Qingdao, Shandong 266071, China
| | - Long Xie
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Yilong Yao
- Green Healthy Aquaculture Research Center, Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Building 26 Lihe Technology Park, Auxiliary Road of Xinxi Avenue South, Nanhai District, Foshan 528226, China
| | - Biao Deng
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Jiaying Li
- Department of Ophthalmology, Beijing Institute of Ophthalmology, Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Dongjiaomin lane No1, Dongcheng District, Beijing 100101, China
| | - Baojun De
- College of Life Sciences, Inner Mongolia Autonomous Region Key Laboratory of Biomanufacturing, Inner Mongolia Agricultural University, NO. 306 Zhaowuda Road, Saihan District, Hohhot 010018, China
| | - Yuting Chen
- College of Life Sciences, Inner Mongolia Autonomous Region Key Laboratory of Biomanufacturing, Inner Mongolia Agricultural University, NO. 306 Zhaowuda Road, Saihan District, Hohhot 010018, China
| | - Jing Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Tian Li
- College of JUNCAO Science and Ecology, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University (FAFU), NO.15 Shangxiadian Road, Cangshan District, Fuzhou 0350002, China
| | - Ranran Liu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road NO. 2, Haidian District, Beijing 100193, China
| | - Zhonglin Tang
- Green Healthy Aquaculture Research Center, Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Building 26 Lihe Technology Park, Auxiliary Road of Xinxi Avenue South, Nanhai District, Foshan 528226, China
| | - Junwei Cao
- College of Life Sciences, Inner Mongolia Autonomous Region Key Laboratory of Biomanufacturing, Inner Mongolia Agricultural University, NO. 306 Zhaowuda Road, Saihan District, Hohhot 010018, China
| | - Erwei Zuo
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Chugang Mei
- College of Grassland Agriculture, National Beef Cattle Improvement Center, College of Animal Science and Technology, Northwest A&F University, NO. 3 Taicheng Road, Yangling District, Yangling, Shaanxi 712100, China
| | - Fangjie Zhu
- College of JUNCAO Science and Ecology, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University (FAFU), NO.15 Shangxiadian Road, Cangshan District, Fuzhou 0350002, China
| | - Changwei Shao
- State Key Laboratory of Maricultural Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, NO.106 Nanjing Road, Shinan District, Qingdao, Shandong 266071, China
| | - Guirong Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Tongjun Sun
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, NO. 7 Pengfei Road, Dapeng District, Shenzhen 518124, China
| | - Ningli Wang
- Department of Ophthalmology, Beijing Institute of Ophthalmology, Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Dongjiaomin lane No1, Dongcheng District, Beijing 100101, China
| | - Gang Liu
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, NO.1 Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Jian-Quan Ni
- Gene Regulatory Lab, School of Basic Medical Sciences, Tsinghua University, NO. 30 Shuangqing road, Haidian district, Beijing 100084, China
- State Key Laboratory of Molecular Oncology, Tsinghua University, NO. 30 Shuangqing road, Haidian district, Beijing 100084, China
- SXMU-Tsinghua Collaborative Innovation Center for Frontier Medicine, Shanxi Medical University, NO. 56 Xinjian South Road, Yingze District, Taiyuan 030001, China
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Green Healthy Aquaculture Research Center, Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Building 26 Lihe Technology Park, Auxiliary Road of Xinxi Avenue South, Nanhai District, Foshan 528226, China
| |
Collapse
|
22
|
Zhou J, Rizzo K, Tang Z, Koo PK. Uncertainty-aware genomic deep learning with knowledge distillation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.13.623485. [PMID: 39605624 PMCID: PMC11601481 DOI: 10.1101/2024.11.13.623485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Deep neural networks (DNNs) have advanced predictive modeling for regulatory genomics, but challenges remain in ensuring the reliability of their predictions and understanding the key factors behind their decision making. Here we introduce DEGU (Distilling Ensembles for Genomic Uncertainty-aware models), a method that integrates ensemble learning and knowledge distillation to improve the robustness and explainability of DNN predictions. DEGU distills the predictions of an ensemble of DNNs into a single model, capturing both the average of the ensemble's predictions and the variability across them, with the latter representing epistemic (or model-based) uncertainty. DEGU also includes an optional auxiliary task to estimate aleatoric, or data-based, uncertainty by modeling variability across experimental replicates. By applying DEGU across various functional genomic prediction tasks, we demonstrate that DEGU-trained models inherit the performance benefits of ensembles in a single model, with improved generalization to out-of-distribution sequences and more consistent explanations of cis-regulatory mechanisms through attribution analysis. Moreover, DEGU-trained models provide calibrated uncertainty estimates, with conformal prediction offering coverage guarantees under minimal assumptions. Overall, DEGU paves the way for robust and trustworthy applications of deep learning in genomics research.
Collapse
Affiliation(s)
- Jessica Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| | - Kaeli Rizzo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| | - Ziqi Tang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
- Currently at InstaDeep, Cambridge, MA, USA
| | - Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| |
Collapse
|
23
|
Yu Z, Zhang Y. Foundation model for comprehensive transcriptional regulation analysis. Natl Sci Rev 2024; 11:nwae355. [PMID: 39555104 PMCID: PMC11565239 DOI: 10.1093/nsr/nwae355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 09/22/2024] [Accepted: 10/11/2024] [Indexed: 11/19/2024] Open
Affiliation(s)
- Zhaowei Yu
- State Key Laboratory of Cardiovascular Diseases and Medical Innovation Center, Institute for Regenerative Medicine, Department of Neurosurgery, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Sciences and Technology, Tongji University, China
| | - Yong Zhang
- State Key Laboratory of Cardiovascular Diseases and Medical Innovation Center, Institute for Regenerative Medicine, Department of Neurosurgery, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Sciences and Technology, Tongji University, China
| |
Collapse
|
24
|
Oesinghaus L, Castillo-Hair S, Ludwig N, Keller A, Seelig G. Quantitative design of cell type-specific mRNA stability from microRNA expression data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.28.620728. [PMID: 39554011 PMCID: PMC11565874 DOI: 10.1101/2024.10.28.620728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Limiting expression to target cell types is a longstanding goal in gene therapy, which could be met by sensing endogenous microRNA. However, an unclear association between microRNA expression and activity currently hampers such an approach. Here, we probe this relationship by measuring the stability of synthetic microRNA-responsive 3'UTRs across 10 cell lines in a library format. By systematically addressing biases in microRNA expression data and confounding factors such as microRNA crosstalk, we demonstrate that a straightforward model can quantitatively predict reporter stability purely from expression data. We use this model to design constructs with previously unattainable response patterns across our cell lines. The rules we derive for microRNA expression data selection and processing should apply to microRNA- responsive devices for any environment with available expression data.
Collapse
|
25
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
26
|
Lal A, Garfield D, Biancalani T, Eraslan G. Designing realistic regulatory DNA with autoregressive language models. Genome Res 2024; 34:1411-1420. [PMID: 39322281 PMCID: PMC11529870 DOI: 10.1101/gr.279142.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 08/19/2024] [Indexed: 09/27/2024]
Abstract
Cis-regulatory elements (CREs), such as promoters and enhancers, are DNA sequences that regulate the expression of genes. The activity of a CRE is influenced by the order, composition, and spacing of sequence motifs that are bound by proteins called transcription factors (TFs). Synthetic CREs with specific properties are needed for biomanufacturing as well as for many therapeutic applications including cell and gene therapy. Here, we present regLM, a framework to design synthetic CREs with desired properties, such as high, low, or cell type-specific activity, using autoregressive language models in conjunction with supervised sequence-to-function models. We used our framework to design synthetic yeast promoters and cell type-specific human enhancers. We demonstrate that the synthetic CREs generated by our approach are not only predicted to have the desired functionality but also contain biological features similar to experimentally validated CREs. regLM thus facilitates the design of realistic regulatory DNA elements while providing insights into the cis-regulatory code.
Collapse
Affiliation(s)
- Avantika Lal
- Biology Research|AI Development, gRED Computational Sciences, Genentech, South San Francisco, California 94080, USA;
| | - David Garfield
- OMNI Bioinformatics and Department of Regenerative Medicine, Genentech, South San Francisco, California 94080, USA
| | - Tommaso Biancalani
- Biology Research|AI Development, gRED Computational Sciences, Genentech, South San Francisco, California 94080, USA
| | - Gokcen Eraslan
- Biology Research|AI Development, gRED Computational Sciences, Genentech, South San Francisco, California 94080, USA;
| |
Collapse
|
27
|
Mantena S, Pillai PP, Petros BA, Welch NL, Myhrvold C, Sabeti PC, Metsky HC. Model-directed generation of artificial CRISPR-Cas13a guide RNA sequences improves nucleic acid detection. Nat Biotechnol 2024:10.1038/s41587-024-02422-w. [PMID: 39394482 DOI: 10.1038/s41587-024-02422-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 09/04/2024] [Indexed: 10/13/2024]
Abstract
CRISPR guide RNA sequences deriving exactly from natural sequences may not perform optimally in every application. Here we implement and evaluate algorithms for designing maximally fit, artificial CRISPR-Cas13a guides with multiple mismatches to natural sequences that are tailored for diagnostic applications. These guides offer more sensitive detection of diverse pathogens and discrimination of pathogen variants compared with guides derived directly from natural sequences and illuminate design principles that broaden Cas13a targeting.
Collapse
Affiliation(s)
- Sreekar Mantena
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | | | - Brittany A Petros
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Health Sciences and Technology, Harvard Medical School and Massachusetts Institute of Technology, Cambridge, MA, USA
- MD-PhD Program, Harvard/Massachusetts Institute of Technology, Boston, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | | | - Cameron Myhrvold
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, USA
- Omenn-Darling Bioengineering Institute, Princeton University, Princeton, NJ, USA
- Department of Chemistry, Princeton University, Princeton, NJ, USA
| | - Pardis C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | | |
Collapse
|
28
|
Pfenning AR. AI-designed DNA sequences regulate cell-type-specific gene expression. Nature 2024; 634:1059-1061. [PMID: 39443764 DOI: 10.1038/d41586-024-03170-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2024]
|
29
|
Gosai SJ, Castro RI, Fuentes N, Butts JC, Mouri K, Alasoadura M, Kales S, Nguyen TTL, Noche RR, Rao AS, Joy MT, Sabeti PC, Reilly SK, Tewhey R. Machine-guided design of cell-type-targeting cis-regulatory elements. Nature 2024; 634:1211-1220. [PMID: 39443793 PMCID: PMC11525185 DOI: 10.1038/s41586-024-08070-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 09/18/2024] [Indexed: 10/25/2024]
Abstract
Cis-regulatory elements (CREs) control gene expression, orchestrating tissue identity, developmental timing and stimulus responses, which collectively define the thousands of unique cell types in the body1-3. While there is great potential for strategically incorporating CREs in therapeutic or biotechnology applications that require tissue specificity, there is no guarantee that an optimal CRE for these intended purposes has arisen naturally. Here we present a platform to engineer and validate synthetic CREs capable of driving gene expression with programmed cell-type specificity. We take advantage of innovations in deep neural network modelling of CRE activity across three cell types, efficient in silico optimization and massively parallel reporter assays to design and empirically test thousands of CREs4-8. Through large-scale in vitro validation, we show that synthetic sequences are more effective at driving cell-type-specific expression in three cell lines compared with natural sequences from the human genome and achieve specificity in analogous tissues when tested in vivo. Synthetic sequences exhibit distinct motif vocabulary associated with activity in the on-target cell type and a simultaneous reduction in the activity of off-target cells. Together, we provide a generalizable framework to prospectively engineer CREs from massively parallel reporter assay models and demonstrate the required literacy to write fit-for-purpose regulatory code.
Collapse
Affiliation(s)
- Sager J Gosai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Graduate Program in Biological and Biomedical Science, Boston, MA, USA.
- Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| | | | - Natalia Fuentes
- The Jackson Laboratory, Bar Harbor, ME, USA
- Harvard College, Harvard University, Cambridge, MA, USA
| | - John C Butts
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
| | | | | | | | | | - Ramil R Noche
- Department of Comparative Medicine, Yale School of Medicine, New Haven, CT, USA
- Yale Zebrafish Research Core, Yale School of Medicine, New Haven, CT, USA
| | - Arya S Rao
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Mary T Joy
- The Jackson Laboratory, Bar Harbor, ME, USA
| | - Pardis C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Department of Immunology and Infectious Diseases, Harvard T H Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Steven K Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA.
- Wu Tsai Institute, Yale University, New Haven, CT, USA.
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME, USA.
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA.
- Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA, USA.
| |
Collapse
|
30
|
Russo M, Chen M, Mariella E, Peng H, Rehman SK, Sancho E, Sogari A, Toh TS, Balaban NQ, Batlle E, Bernards R, Garnett MJ, Hangauer M, Leucci E, Marine JC, O'Brien CA, Oren Y, Patton EE, Robert C, Rosenberg SM, Shen S, Bardelli A. Cancer drug-tolerant persister cells: from biological questions to clinical opportunities. Nat Rev Cancer 2024; 24:694-717. [PMID: 39223250 DOI: 10.1038/s41568-024-00737-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/29/2024] [Indexed: 09/04/2024]
Abstract
The emergence of drug resistance is the most substantial challenge to the effectiveness of anticancer therapies. Orthogonal approaches have revealed that a subset of cells, known as drug-tolerant 'persister' (DTP) cells, have a prominent role in drug resistance. Although long recognized in bacterial populations which have acquired resistance to antibiotics, the presence of DTPs in various cancer types has come to light only in the past two decades, yet several aspects of their biology remain enigmatic. Here, we delve into the biological characteristics of DTPs and explore potential strategies for tracking and targeting them. Recent findings suggest that DTPs exhibit remarkable plasticity, being capable of transitioning between different cellular states, resulting in distinct DTP phenotypes within a single tumour. However, defining the biological features of DTPs has been challenging, partly due to the complex interplay between clonal dynamics and tissue-specific factors influencing their phenotype. Moreover, the interactions between DTPs and the tumour microenvironment, including their potential to evade immune surveillance, remain to be discovered. Finally, the mechanisms underlying DTP-derived drug resistance and their correlation with clinical outcomes remain poorly understood. This Roadmap aims to provide a comprehensive overview of the field of DTPs, encompassing past achievements and current endeavours in elucidating their biology. We also discuss the prospect of future advancements in technologies in helping to unveil the features of DTPs and propose novel therapeutic strategies that could lead to their eradication.
Collapse
Affiliation(s)
- Mariangela Russo
- Department of Oncology, Molecular Biotechnology Center, University of Torino, Torino, Italy.
- IFOM ETS, The AIRC Institute of Molecular Oncology, Milano, Italy.
| | - Mengnuo Chen
- Division of Molecular Carcinogenesis, Oncode Institute, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Elisa Mariella
- Department of Oncology, Molecular Biotechnology Center, University of Torino, Torino, Italy
- IFOM ETS, The AIRC Institute of Molecular Oncology, Milano, Italy
| | - Haoning Peng
- Institute of Thoracic Oncology and National Clinical Research Center for Geriatrics, West China Hospital of Sichuan University, Chengdu, China
| | - Sumaiyah K Rehman
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Elena Sancho
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Centro de Investigación Biomédica en Red de Cáncer (CIBERONC), Barcelona, Spain
| | - Alberto Sogari
- Department of Oncology, Molecular Biotechnology Center, University of Torino, Torino, Italy
- IFOM ETS, The AIRC Institute of Molecular Oncology, Milano, Italy
| | - Tzen S Toh
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Nathalie Q Balaban
- Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Eduard Batlle
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Centro de Investigación Biomédica en Red de Cáncer (CIBERONC), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Rene Bernards
- Division of Molecular Carcinogenesis, Oncode Institute, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | | | - Matthew Hangauer
- Department of Dermatology, University of California San Diego, San Diego, CA, USA
| | | | - Jean-Christophe Marine
- Department of Oncology, KU Leuven, Leuven, Belgium
- Center for Cancer Biology, VIB, Leuven, Belgium
| | - Catherine A O'Brien
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Department of Surgery, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Yaara Oren
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - E Elizabeth Patton
- MRC Human Genetics Unit, and CRUK Scotland Centre and Edinburgh Cancer Research, Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, UK
| | - Caroline Robert
- Oncology Department, Dermatology Unit, Villejuif, France
- Oncology Department and INSERM U981, Villejuif, France
- Paris Saclay University, Villejuif, France
| | - Susan M Rosenberg
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Shensi Shen
- Institute of Thoracic Oncology and National Clinical Research Center for Geriatrics, West China Hospital of Sichuan University, Chengdu, China
| | - Alberto Bardelli
- Department of Oncology, Molecular Biotechnology Center, University of Torino, Torino, Italy.
- IFOM ETS, The AIRC Institute of Molecular Oncology, Milano, Italy.
| |
Collapse
|
31
|
Ben-Simon Y, Hooper M, Narayan S, Daigle T, Dwivedi D, Way SW, Oster A, Stafford DA, Mich JK, Taormina MJ, Martinez RA, Opitz-Araya X, Roth JR, Allen S, Ayala A, Bakken TE, Barcelli T, Barta S, Bendrick J, Bertagnolli D, Bowlus J, Boyer G, Brouner K, Casian B, Casper T, Chakka AB, Chakrabarty R, Chance RK, Chavan S, Departee M, Donadio N, Dotson N, Egdorf T, Gabitto M, Garcia J, Gary A, Gasperini M, Goldy J, Gore BB, Graybuck L, Greisman N, Haeseleer F, Halterman C, Helback O, Hockemeyer D, Huang C, Huff S, Hunker A, Johansen N, Juneau Z, Kalmbach B, Khem S, Kussick E, Kutsal R, Larsen R, Lee C, Lee AY, Leibly M, Lenz GH, Liang E, Lusk N, Malone J, Mollenkopf T, Morin E, Newman D, Ng L, Ngo K, Omstead V, Oyama A, Pham T, Pom CA, Potekhina L, Ransford S, Rette D, Rimorin C, Rocha D, Ruiz A, Sanchez RE, Sedeno-Cortes A, Sevigny JP, Shapovalova N, Shulga L, Sigler AR, Siverts LA, Somasundaram S, Stewart K, Szelenyi E, Tieu M, Trader C, van Velthoven CT, Walker M, Weed N, Wirthlin M, Wood T, Wynalda B, Yao Z, Zhou T, Ariza J, Dee N, Reding M, et alBen-Simon Y, Hooper M, Narayan S, Daigle T, Dwivedi D, Way SW, Oster A, Stafford DA, Mich JK, Taormina MJ, Martinez RA, Opitz-Araya X, Roth JR, Allen S, Ayala A, Bakken TE, Barcelli T, Barta S, Bendrick J, Bertagnolli D, Bowlus J, Boyer G, Brouner K, Casian B, Casper T, Chakka AB, Chakrabarty R, Chance RK, Chavan S, Departee M, Donadio N, Dotson N, Egdorf T, Gabitto M, Garcia J, Gary A, Gasperini M, Goldy J, Gore BB, Graybuck L, Greisman N, Haeseleer F, Halterman C, Helback O, Hockemeyer D, Huang C, Huff S, Hunker A, Johansen N, Juneau Z, Kalmbach B, Khem S, Kussick E, Kutsal R, Larsen R, Lee C, Lee AY, Leibly M, Lenz GH, Liang E, Lusk N, Malone J, Mollenkopf T, Morin E, Newman D, Ng L, Ngo K, Omstead V, Oyama A, Pham T, Pom CA, Potekhina L, Ransford S, Rette D, Rimorin C, Rocha D, Ruiz A, Sanchez RE, Sedeno-Cortes A, Sevigny JP, Shapovalova N, Shulga L, Sigler AR, Siverts LA, Somasundaram S, Stewart K, Szelenyi E, Tieu M, Trader C, van Velthoven CT, Walker M, Weed N, Wirthlin M, Wood T, Wynalda B, Yao Z, Zhou T, Ariza J, Dee N, Reding M, Ronellenfitch K, Mufti S, Sunkin SM, Smith KA, Esposito L, Waters J, Thyagarajan B, Yao S, Lein ES, Zeng H, Levi BP, Ngai J, Ting J, Tasic B. A suite of enhancer AAVs and transgenic mouse lines for genetic access to cortical cell types. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.10.597244. [PMID: 38915722 PMCID: PMC11195086 DOI: 10.1101/2024.06.10.597244] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
The mammalian cortex is comprised of cells classified into types according to shared properties. Defining the contribution of each cell type to the processes guided by the cortex is essential for understanding its function in health and disease. We used transcriptomic and epigenomic cortical cell type taxonomies from mouse and human to define marker genes and putative enhancers and created a large toolkit of transgenic lines and enhancer AAVs for selective targeting of cortical cell populations. We report evaluation of fifteen new transgenic driver lines, two new reporter lines, and >800 different enhancer AAVs covering most subclasses of cortical cells. The tools reported here as well as the scaled process of tool creation and modification enable diverse experimental strategies towards understanding mammalian cortex and brain function.
Collapse
Affiliation(s)
- Yoav Ben-Simon
- Allen Institute for Brain Science, Seattle, WA 98109
- Equivalent contribution
| | - Marcus Hooper
- Allen Institute for Brain Science, Seattle, WA 98109
- Equivalent contribution
| | - Sujatha Narayan
- Allen Institute for Brain Science, Seattle, WA 98109
- Equivalent contribution
| | - Tanya Daigle
- Allen Institute for Brain Science, Seattle, WA 98109
- Equivalent contribution
| | | | - Sharon W. Way
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Aaron Oster
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - John K. Mich
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | | | | | - Jada R. Roth
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Shona Allen
- University of California, Berkeley, Berkeley, CA 94720
| | - Angela Ayala
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | | | - Stuard Barta
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | | | | | | | | | | | - Tamara Casper
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | | | | | - Sakshi Chavan
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | | | | | - Tom Egdorf
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Jazmin Garcia
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Amanda Gary
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Jeff Goldy
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Bryan B. Gore
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Noah Greisman
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | | | | | | | - Cindy Huang
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Sydney Huff
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Avery Hunker
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Zoe Juneau
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Shannon Khem
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Emily Kussick
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Rana Kutsal
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Changkyu Lee
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Angus Y. Lee
- University of California, Berkeley, Berkeley, CA 94720
| | | | | | | | - Nicholas Lusk
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | | | - Elyse Morin
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Dakota Newman
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Lydia Ng
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Kiet Ngo
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Alana Oyama
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | | | | | - Shea Ransford
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Dean Rette
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Dana Rocha
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Augustin Ruiz
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | | | | | | | | | - Ana R. Sigler
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | | | - Kaiya Stewart
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Eric Szelenyi
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Michael Tieu
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | | | | | - Natalie Weed
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Toren Wood
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Zizhen Yao
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Thomas Zhou
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Nick Dee
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | | | - Shoaib Mufti
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | | | - Luke Esposito
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Jack Waters
- Allen Institute for Brain Science, Seattle, WA 98109
| | | | - Shenqin Yao
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Ed S. Lein
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Hongkui Zeng
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Boaz P. Levi
- Allen Institute for Brain Science, Seattle, WA 98109
| | - John Ngai
- University of California, Berkeley, Berkeley, CA 94720
- Present affiliation: National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892
| | - Jonathan Ting
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Bosiljka Tasic
- Allen Institute for Brain Science, Seattle, WA 98109
- Lead contact
| |
Collapse
|
32
|
Mulet-Lazaro R, Delwel R. Oncogenic Enhancers in Leukemia. Blood Cancer Discov 2024; 5:303-317. [PMID: 39093124 PMCID: PMC11369600 DOI: 10.1158/2643-3230.bcd-23-0211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 06/06/2024] [Accepted: 07/17/2024] [Indexed: 08/04/2024] Open
Abstract
Although the study of leukemogenesis has traditionally focused on protein-coding genes, the role of enhancer dysregulation is becoming increasingly recognized. The advent of high-throughput sequencing, together with a better understanding of enhancer biology, has revealed how various genetic and epigenetic lesions produce oncogenic enhancers that drive transformation. These aberrations include translocations that lead to enhancer hijacking, point mutations that modulate enhancer activity, and copy number alterations that modify enhancer dosage. In this review, we describe these mechanisms in the context of leukemia and discuss potential therapeutic avenues to target these regulatory elements. Significance: Large-scale sequencing projects have uncovered recurrent gene mutations in leukemia, but the picture remains incomplete: some patients harbor no such aberrations, whereas others carry only a few that are insufficient to bring about transformation on their own. One of the missing pieces is enhancer dysfunction, which only recently has emerged as a critical driver of leukemogenesis. Knowledge of the various mechanisms of enhancer dysregulation is thus key for a complete understanding of leukemia and its causes, as well as the development of targeted therapies in the era of precision medicine.
Collapse
Affiliation(s)
- Roger Mulet-Lazaro
- Department of Hematology, Erasmus MC Cancer Institute, Rotterdam, the Netherlands.
- Oncode Institute, Utrecht, the Netherlands.
| | - Ruud Delwel
- Department of Hematology, Erasmus MC Cancer Institute, Rotterdam, the Netherlands.
- Oncode Institute, Utrecht, the Netherlands.
| |
Collapse
|
33
|
Xie W, Yao Z, Yuan Y, Too J, Li F, Wang H, Zhan Y, Wu X, Wang Z, Zhang G. W2V-repeated index: Prediction of enhancers and their strength based on repeated fragments. Genomics 2024; 116:110906. [PMID: 39084477 DOI: 10.1016/j.ygeno.2024.110906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 07/10/2024] [Accepted: 07/24/2024] [Indexed: 08/02/2024]
Abstract
Enhancers are crucial in gene expression regulation, dictating the specificity and timing of transcriptional activity, which highlights the importance of their identification for unravelling the intricacies of genetic regulation. Therefore, it is critical to identify enhancers and their strengths. Repeated sequences in the genome are repeats of the same or symmetrical fragments. There has been a great deal of evidence that repetitive sequences contain enormous amounts of genetic information. Thus, We introduce the W2V-Repeated Index, designed to identify enhancer sequence fragments and evaluates their strength through the analysis of repeated K-mer sequences in enhancer regions. Utilizing the word2vector algorithm for numerical conversion and Manta Ray Foraging Optimization for feature selection, this method effectively captures the frequency and distribution of K-mer sequences. By concentrating on repeated K-mer sequences, it minimizes computational complexity and facilitates the analysis of larger K values. Experiments indicate that our method performs better than all other advanced methods on almost all indicators.
Collapse
Affiliation(s)
- Weiming Xie
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China
| | - Zhaomin Yao
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China.
| | - Yizhe Yuan
- China Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jingwei Too
- Faculty of Electrical Engineering, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, Durian Tunggal, 76100 Melaka, Malaysia
| | - Fei Li
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
| | - Hongyu Wang
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China
| | - Ying Zhan
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China
| | - Xiaodan Wu
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China
| | - Zhiguo Wang
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China.
| | - Guoxu Zhang
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China.
| |
Collapse
|
34
|
Liu J, Castillo-Hair SM, Du LY, Wang Y, Carte AN, Colomer-Rosell M, Yin C, Seelig G, Schier AF. Dissecting the regulatory logic of specification and differentiation during vertebrate embryogenesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.27.609971. [PMID: 39253514 PMCID: PMC11383055 DOI: 10.1101/2024.08.27.609971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
The interplay between transcription factors and chromatin accessibility regulates cell type diversification during vertebrate embryogenesis. To systematically decipher the gene regulatory logic guiding this process, we generated a single-cell multi-omics atlas of RNA expression and chromatin accessibility during early zebrafish embryogenesis. We developed a deep learning model to predict chromatin accessibility based on DNA sequence and found that a small number of transcription factors underlie cell-type-specific chromatin landscapes. While Nanog is well-established in promoting pluripotency, we discovered a new function in priming the enhancer accessibility of mesendodermal genes. In addition to the classical stepwise mode of differentiation, we describe instant differentiation, where pluripotent cells skip intermediate fate transitions and terminally differentiate. Reconstruction of gene regulatory interactions reveals that this process is driven by a shallow network in which maternally deposited regulators activate a small set of transcription factors that co-regulate hundreds of differentiation genes. Notably, misexpression of these transcription factors in pluripotent cells is sufficient to ectopically activate their targets. This study provides a rich resource for analyzing embryonic gene regulation and reveals the regulatory logic of instant differentiation.
Collapse
Affiliation(s)
- Jialin Liu
- Biozentrum, University of Basel, Basel, 4056, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, University of Washington, Seattle, WA, 98195, USA
| | | | - Lucia Y. Du
- Biozentrum, University of Basel, Basel, 4056, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, University of Washington, Seattle, WA, 98195, USA
| | - Yiqun Wang
- Biozentrum, University of Basel, Basel, 4056, Switzerland
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, UCSD, La Jolla, CA, 92037, USA
| | - Adam N. Carte
- Biozentrum, University of Basel, Basel, 4056, Switzerland
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, 02115, USA
| | - Mariona Colomer-Rosell
- Biozentrum, University of Basel, Basel, 4056, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, University of Washington, Seattle, WA, 98195, USA
| | - Christopher Yin
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA, 98195, USA
| | - Georg Seelig
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA, 98195, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, 98195, USA
| | - Alexander F. Schier
- Biozentrum, University of Basel, Basel, 4056, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, University of Washington, Seattle, WA, 98195, USA
| |
Collapse
|
35
|
Kowalski MH, Wessels HH, Linder J, Dalgarno C, Mascio I, Choudhary S, Hartman A, Hao Y, Kundaje A, Satija R. Multiplexed single-cell characterization of alternative polyadenylation regulators. Cell 2024; 187:4408-4425.e23. [PMID: 38925112 DOI: 10.1016/j.cell.2024.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/12/2024] [Accepted: 06/05/2024] [Indexed: 06/28/2024]
Abstract
Most mammalian genes have multiple polyA sites, representing a substantial source of transcript diversity regulated by the cleavage and polyadenylation (CPA) machinery. To better understand how these proteins govern polyA site choice, we introduce CPA-Perturb-seq, a multiplexed perturbation screen dataset of 42 CPA regulators with a 3' scRNA-seq readout that enables transcriptome-wide inference of polyA site usage. We develop a framework to detect perturbation-dependent changes in polyadenylation and characterize modules of co-regulated polyA sites. We find groups of intronic polyA sites regulated by distinct components of the nuclear RNA life cycle, including elongation, splicing, termination, and surveillance. We train and validate a deep neural network (APARENT-Perturb) for tandem polyA site usage, delineating a cis-regulatory code that predicts perturbation response and reveals interactions between regulatory complexes. Our work highlights the potential for multiplexed single-cell perturbation screens to further our understanding of post-transcriptional regulation.
Collapse
Affiliation(s)
- Madeline H Kowalski
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York University Grossman School of Medicine, New York, NY, USA
| | - Hans-Hermann Wessels
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA.
| | - Johannes Linder
- Department of Genetics, Stanford University, Stanford, CA, USA; Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | - Isabella Mascio
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Saket Choudhary
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | | | - Yuhan Hao
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA, USA; Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Rahul Satija
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York University Grossman School of Medicine, New York, NY, USA.
| |
Collapse
|
36
|
Xu L, Liu Y. Identification, Design, and Application of Noncoding Cis-Regulatory Elements. Biomolecules 2024; 14:945. [PMID: 39199333 PMCID: PMC11352686 DOI: 10.3390/biom14080945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 07/25/2024] [Accepted: 07/30/2024] [Indexed: 09/01/2024] Open
Abstract
Cis-regulatory elements (CREs) play a pivotal role in orchestrating interactions with trans-regulatory factors such as transcription factors, RNA-binding proteins, and noncoding RNAs. These interactions are fundamental to the molecular architecture underpinning complex and diverse biological functions in living organisms, facilitating a myriad of sophisticated and dynamic processes. The rapid advancement in the identification and characterization of these regulatory elements has been marked by initiatives such as the Encyclopedia of DNA Elements (ENCODE) project, which represents a significant milestone in the field. Concurrently, the development of CRE detection technologies, exemplified by massively parallel reporter assays, has progressed at an impressive pace, providing powerful tools for CRE discovery. The exponential growth of multimodal functional genomic data has necessitated the application of advanced analytical methods. Deep learning algorithms, particularly large language models, have emerged as invaluable tools for deconstructing the intricate nucleotide sequences governing CRE function. These advancements facilitate precise predictions of CRE activity and enable the de novo design of CREs. A deeper understanding of CRE operational dynamics is crucial for harnessing their versatile regulatory properties. Such insights are instrumental in refining gene therapy techniques, enhancing the efficacy of selective breeding programs, pushing the boundaries of genetic innovation, and opening new possibilities in microbial synthetic biology.
Collapse
Affiliation(s)
- Lingna Xu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China;
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China;
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Foshan 528226, China
| |
Collapse
|
37
|
Chen W, Choi J, Li X, Nathans JF, Martin B, Yang W, Hamazaki N, Qiu C, Lalanne JB, Regalado S, Kim H, Agarwal V, Nichols E, Leith A, Lee C, Shendure J. Symbolic recording of signalling and cis-regulatory element activity to DNA. Nature 2024; 632:1073-1081. [PMID: 39020177 PMCID: PMC11357993 DOI: 10.1038/s41586-024-07706-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Accepted: 06/12/2024] [Indexed: 07/19/2024]
Abstract
Measurements of gene expression or signal transduction activity are conventionally performed using methods that require either the destruction or live imaging of a biological sample within the timeframe of interest. Here we demonstrate an alternative paradigm in which such biological activities are stably recorded to the genome. Enhancer-driven genomic recording of transcriptional activity in multiplex (ENGRAM) is based on the signal-dependent production of prime editing guide RNAs that mediate the insertion of signal-specific barcodes (symbols) into a genomically encoded recording unit. We show how this strategy can be used for multiplex recording of the cell-type-specific activities of dozens to hundreds of cis-regulatory elements with high fidelity, sensitivity and reproducibility. Leveraging signal transduction pathway-responsive cis-regulatory elements, we also demonstrate time- and concentration-dependent genomic recording of WNT, NF-κB and Tet-On activities. By coupling ENGRAM to sequential genome editing via DNA Typewriter1, we stably record information about the temporal dynamics of two orthogonal signalling pathways to genomic DNA. Finally we apply ENGRAM to integratively record the transient activity of nearly 100 transcription factor consensus motifs across daily windows spanning the differentiation of mouse embryonic stem cells into gastruloids, an in vitro model of early mammalian development. Although these are proof-of-concept experiments and much work remains to fully realize the possibilities, the symbolic recording of biological signals or states within cells, to the genome and over time, has broad potential to complement contemporary paradigms for how we make measurements in biological systems.
Collapse
Affiliation(s)
- Wei Chen
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, USA.
| | - Junhong Choi
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
- Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Xiaoyi Li
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Jenny F Nathans
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA, USA
| | - Beth Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Wei Yang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Nobuhiko Hamazaki
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
- Department of Obstetrics & Gynecology, University of Washington, Seattle, WA, USA
- Institute for Stem Cell & Regenerative Medicine, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
| | - Chengxiang Qiu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | | | - Samuel Regalado
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Haedong Kim
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Vikram Agarwal
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Eva Nichols
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Anh Leith
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Seattle Hub for Synthetic Biology, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
| |
Collapse
|
38
|
Borowsky AT, Bailey-Serres J. Rewiring gene circuitry for plant improvement. Nat Genet 2024; 56:1574-1582. [PMID: 39075207 DOI: 10.1038/s41588-024-01806-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 05/17/2024] [Indexed: 07/31/2024]
Abstract
Aspirations for high crop growth and yield, nutritional quality and bioproduction of materials are challenged by climate change and limited adoption of new technologies. Here, we review recent advances in approaches to profile and model gene regulatory activity over developmental and response time in specific cells, which have revealed the basis of variation in plant phenotypes: both redeployment of key regulators to new contexts and their repurposing to control different slates of genes. New synthetic biology tools allow tunable, spatiotemporal regulation of transgenes, while recent gene-editing technologies enable manipulation of the regulation of native genes. Ultimately, understanding how gene circuitry is wired to control form and function across varied plant species, combined with advanced technology to rewire that circuitry, will unlock solutions to our greatest challenges in agriculture, energy and the environment.
Collapse
Affiliation(s)
- Alexander T Borowsky
- Center for Plant Cell Biology, Department of Botany and Plant Sciences, University of California, Riverside, Riverside, CA, USA
| | - Julia Bailey-Serres
- Center for Plant Cell Biology, Department of Botany and Plant Sciences, University of California, Riverside, Riverside, CA, USA.
| |
Collapse
|
39
|
Ding N, Yuan Z, Ma Z, Wu Y, Yin L. AI-Assisted Rational Design and Activity Prediction of Biological Elements for Optimizing Transcription-Factor-Based Biosensors. Molecules 2024; 29:3512. [PMID: 39124917 PMCID: PMC11313831 DOI: 10.3390/molecules29153512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/12/2024] Open
Abstract
The rational design, activity prediction, and adaptive application of biological elements (bio-elements) are crucial research fields in synthetic biology. Currently, a major challenge in the field is efficiently designing desired bio-elements and accurately predicting their activity using vast datasets. The advancement of artificial intelligence (AI) technology has enabled machine learning and deep learning algorithms to excel in uncovering patterns in bio-element data and predicting their performance. This review explores the application of AI algorithms in the rational design of bio-elements, activity prediction, and the regulation of transcription-factor-based biosensor response performance using AI-designed elements. We discuss the advantages, adaptability, and biological challenges addressed by the AI algorithms in various applications, highlighting their powerful potential in analyzing biological data. Furthermore, we propose innovative solutions to the challenges faced by AI algorithms in the field and suggest future research directions. By consolidating current research and demonstrating the practical applications and future potential of AI in synthetic biology, this review provides valuable insights for advancing both academic research and practical applications in biotechnology.
Collapse
Affiliation(s)
- Nana Ding
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou 311300, China;
- Zhejiang Provincial Key Laboratory of Resources Protection and Innovation of Traditional Chinese Medicine, Zhejiang A&F University, Hangzhou 311300, China
| | - Zenan Yuan
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou 311300, China;
- Zhejiang Provincial Key Laboratory of Resources Protection and Innovation of Traditional Chinese Medicine, Zhejiang A&F University, Hangzhou 311300, China
| | - Zheng Ma
- Zhejiang Provincial Key Laboratory of Biometrology and Inspection & Quarantine, College of Life Sciences, China Jiliang University, Hangzhou 310018, China;
| | - Yefei Wu
- Zhejiang Qianjiang Biochemical Co., Ltd., Haining 314400, China;
| | - Lianghong Yin
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou 311300, China;
- Zhejiang Provincial Key Laboratory of Resources Protection and Innovation of Traditional Chinese Medicine, Zhejiang A&F University, Hangzhou 311300, China
| |
Collapse
|
40
|
Liberali P, Schier AF. The evolution of developmental biology through conceptual and technological revolutions. Cell 2024; 187:3461-3495. [PMID: 38906136 DOI: 10.1016/j.cell.2024.05.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/28/2024] [Accepted: 05/29/2024] [Indexed: 06/23/2024]
Abstract
Developmental biology-the study of the processes by which cells, tissues, and organisms develop and change over time-has entered a new golden age. After the molecular genetics revolution in the 80s and 90s and the diversification of the field in the early 21st century, we have entered a phase when powerful technologies provide new approaches and open unexplored avenues. Progress in the field has been accelerated by advances in genomics, imaging, engineering, and computational biology and by emerging model systems ranging from tardigrades to organoids. We summarize how revolutionary technologies have led to remarkable progress in understanding animal development. We describe how classic questions in gene regulation, pattern formation, morphogenesis, organogenesis, and stem cell biology are being revisited. We discuss the connections of development with evolution, self-organization, metabolism, time, and ecology. We speculate how developmental biology might evolve in an era of synthetic biology, artificial intelligence, and human engineering.
Collapse
Affiliation(s)
- Prisca Liberali
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland; University of Basel, Basel, Switzerland.
| | | |
Collapse
|
41
|
Yin C, Hair SC, Byeon GW, Bromley P, Meuleman W, Seelig G. Iterative deep learning-design of human enhancers exploits condensed sequence grammar to achieve cell type-specificity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599076. [PMID: 38915713 PMCID: PMC11195158 DOI: 10.1101/2024.06.14.599076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
An important and largely unsolved problem in synthetic biology is how to target gene expression to specific cell types. Here, we apply iterative deep learning to design synthetic enhancers with strong differential activity between two human cell lines. We initially train models on published datasets of enhancer activity and chromatin accessibility and use them to guide the design of synthetic enhancers that maximize predicted specificity. We experimentally validate these sequences, use the measurements to re-optimize the predictor, and design a second generation of enhancers with improved specificity. Our design methods embed relevant transcription factor binding site (TFBS) motifs with higher frequencies than comparable endogenous enhancers while using a more selective motif vocabulary, and we show that enhancer activity is correlated with transcription factor expression at the single cell level. Finally, we characterize causal features of top enhancers via perturbation experiments and show enhancers as short as 50bp can maintain specificity.
Collapse
Affiliation(s)
- Christopher Yin
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA
| | | | - Gun Woo Byeon
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA
| | - Peter Bromley
- Altius Institute for Biomedical Sciences, Seattle, WA
| | - Wouter Meuleman
- Altius Institute for Biomedical Sciences, Seattle, WA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| | - Georg Seelig
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| |
Collapse
|
42
|
Lalanne JB, Regalado SG, Domcke S, Calderon D, Martin BK, Li X, Li T, Suiter CC, Lee C, Trapnell C, Shendure J. Multiplex profiling of developmental cis-regulatory elements with quantitative single-cell expression reporters. Nat Methods 2024; 21:983-993. [PMID: 38724692 PMCID: PMC11166576 DOI: 10.1038/s41592-024-02260-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 03/22/2024] [Indexed: 06/13/2024]
Abstract
The inability to scalably and precisely measure the activity of developmental cis-regulatory elements (CREs) in multicellular systems is a bottleneck in genomics. Here we develop a dual RNA cassette that decouples the detection and quantification tasks inherent to multiplex single-cell reporter assays. The resulting measurement of reporter expression is accurate over multiple orders of magnitude, with a precision approaching the limit set by Poisson counting noise. Together with RNA barcode stabilization via circularization, these scalable single-cell quantitative expression reporters provide high-contrast readouts, analogous to classic in situ assays but entirely from sequencing. Screening >200 regions of accessible chromatin in a multicellular in vitro model of early mammalian development, we identify 13 (8 previously uncharacterized) autonomous and cell-type-specific developmental CREs. We further demonstrate that chimeric CRE pairs generate cognate two-cell-type activity profiles and assess gain- and loss-of-function multicellular expression phenotypes from CRE variants with perturbed transcription factor binding sites. Single-cell quantitative expression reporters can be applied in developmental and multicellular systems to quantitatively characterize native, perturbed and synthetic CREs at scale, with high sensitivity and at single-cell resolution.
Collapse
Affiliation(s)
| | - Samuel G Regalado
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Silvia Domcke
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Diego Calderon
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Xiaoyi Li
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Tony Li
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Chase C Suiter
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
| |
Collapse
|
43
|
McCann AA, Baniulyte G, Woodstock DL, Sammons MA. Context dependent activity of p63-bound gene regulatory elements. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.09.593326. [PMID: 38766006 PMCID: PMC11100809 DOI: 10.1101/2024.05.09.593326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
The p53 family of transcription factors regulate numerous organismal processes including the development of skin and limbs, ciliogenesis, and preservation of genetic integrity and tumor suppression. p53 family members control these processes and gene expression networks through engagement with DNA sequences within gene regulatory elements. Whereas p53 binding to its cognate recognition sequence is strongly associated with transcriptional activation, p63 can mediate both activation and repression. How the DNA sequence of p63-bound gene regulatory elements is linked to these varied activities is not yet understood. Here, we use massively parallel reporter assays (MPRA) in a range of cellular and genetic contexts to investigate the influence of DNA sequence on p63-mediated transcription. Most regulatory elements with a p63 response element motif (p63RE) activate transcription, with those sites bound by p63 more frequently or adhering closer to canonical p53 family response element sequences driving higher transcriptional output. The most active regulatory elements are those also capable of binding p53. Elements uniquely bound by p63 have varied activity, with p63RE-mediated repression associated with lower overall GC content in flanking sequences. Comparison of activity across cell lines suggests differential activity of elements may be regulated by a combination of p63 abundance or context-specific cofactors. Finally, changes in p63 isoform expression dramatically alters regulatory element activity, primarily shifting inactive elements towards a strong p63-dependent activity. Our analysis of p63-bound gene regulatory elements provides new insight into how sequence, cellular context, and other transcription factors influence p63-dependent transcription. These studies provide a framework for understanding how p63 genomic binding locally regulates transcription. Additionally, these results can be extended to investigate the influence of sequence content, genomic context, chromatin structure on the interplay between p63 isoforms and p53 family paralogs.
Collapse
Affiliation(s)
- Abby A. McCann
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York. 1400 washington Ave, Albany, NY 12222
| | - Gabriele Baniulyte
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York. 1400 washington Ave, Albany, NY 12222
| | - Dana L. Woodstock
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York. 1400 washington Ave, Albany, NY 12222
| | - Morgan A. Sammons
- Department of Biological Sciences and The RNA Institute, University at Albany, State University of New York. 1400 washington Ave, Albany, NY 12222
| |
Collapse
|
44
|
Khetan S, Bulyk ML. Overlapping binding sites underlie TF genomic occupancy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.05.583629. [PMID: 38496549 PMCID: PMC10942454 DOI: 10.1101/2024.03.05.583629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Sequence-specific DNA binding by transcription factors (TFs) is a crucial step in gene regulation. However, current high-throughput in vitro approaches cannot reliably detect lower affinity TF-DNA interactions, which play key roles in gene regulation. Here, we developed PADIT-seq ( p rotein a ffinity to D NA by in vitro transcription and RNA seq uencing) to assay TF binding preferences to all 10-bp DNA sequences at far greater sensitivity than prior approaches. The expanded catalogs of low affinity DNA binding sites for the human TFs HOXD13 and EGR1 revealed that nucleotides flanking high affinity DNA binding sites create overlapping lower affinity sites that together modulate TF genomic occupancy in vivo . Formation of such extended recognition sequences stems from an inherent property of TF binding sites to interweave each other and expands the genomic sequence space for identifying noncoding variants that directly alter TF binding. One-Sentence Summary Overlapping DNA binding sites underlie TF genomic occupancy through their inherent propensity to interweave each other.
Collapse
|
45
|
Company C, Schmitt MJ, Dramaretska Y, Serresi M, Kertalli S, Jiang B, Yin JA, Aguzzi A, Barozzi I, Gargiulo G. Logical design of synthetic cis-regulatory DNA for genetic tracing of cell identities and state changes. Nat Commun 2024; 15:897. [PMID: 38316783 PMCID: PMC10844330 DOI: 10.1038/s41467-024-45069-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 01/12/2024] [Indexed: 02/07/2024] Open
Abstract
Descriptive data are rapidly expanding in biomedical research. Instead, functional validation methods with sufficient complexity remain underdeveloped. Transcriptional reporters allow experimental characterization and manipulation of developmental and disease cell states, but their design lacks flexibility. Here, we report logical design of synthetic cis-regulatory DNA (LSD), a computational framework leveraging phenotypic biomarkers and trans-regulatory networks as input to design reporters marking the activity of selected cellular states and pathways. LSD uses bulk or single-cell biomarkers and a reference genome or custom cis-regulatory DNA datasets with user-defined boundary regions. By benchmarking validated reporters, we integrate LSD with a computational ranking of phenotypic specificity of putative cis-regulatory DNA. Experimentally, LSD-designed reporters targeting a wide range of cell states are functional without minimal promoters. Applied to broadly expressed genes from human and mouse tissues, LSD generates functional housekeeper-like sLCRs compatible with size constraints of AAV vectors for gene therapy applications. A mesenchymal glioblastoma reporter designed by LSD outperforms previously validated ones and canonical cell surface markers. In genome-scale CRISPRa screens, LSD facilitates the discovery of known and novel bona fide cell-state drivers. Thus, LSD captures core principles of cis-regulation and is broadly applicable to studying complex cell states and mechanisms of transcriptional regulation.
Collapse
Affiliation(s)
- Carlos Company
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13092, Berlin, Germany
| | - Matthias Jürgen Schmitt
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13092, Berlin, Germany
| | - Yuliia Dramaretska
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13092, Berlin, Germany
| | - Michela Serresi
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13092, Berlin, Germany
| | - Sonia Kertalli
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13092, Berlin, Germany
| | - Ben Jiang
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13092, Berlin, Germany
| | - Jiang-An Yin
- Institute of Neuropathology, University Hospital Zurich, University of Zurich, 8091, Zurich, Switzerland
| | - Adriano Aguzzi
- Institute of Neuropathology, University Hospital Zurich, University of Zurich, 8091, Zurich, Switzerland
| | - Iros Barozzi
- Center for Cancer Research, Medical University of Vienna, Borschkegasse 8a, 1090, Vienna, Austria
- Department of Surgery and Cancer, Imperial College London, London, UK
| | - Gaetano Gargiulo
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13092, Berlin, Germany.
| |
Collapse
|
46
|
DaSilva LF, Senan S, Patel ZM, Janardhan Reddy A, Gabbita S, Nussbaum Z, Valdez Córdova CM, Wenteler A, Weber N, Tunjic TM, Ahmad Khan T, Li Z, Smith C, Bejan M, Karmel Louis L, Cornejo P, Connell W, Wong ES, Meuleman W, Pinello L. DNA-Diffusion: Leveraging Generative Models for Controlling Chromatin Accessibility and Gene Expression via Synthetic Regulatory Elements. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.578352. [PMID: 38352499 PMCID: PMC10862870 DOI: 10.1101/2024.02.01.578352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/25/2024]
Abstract
The challenge of systematically modifying and optimizing regulatory elements for precise gene expression control is central to modern genomics and synthetic biology. Advancements in generative AI have paved the way for designing synthetic sequences with the aim of safely and accurately modulating gene expression. We leverage diffusion models to design context-specific DNA regulatory sequences, which hold significant potential toward enabling novel therapeutic applications requiring precise modulation of gene expression. Our framework uses a cell type-specific diffusion model to generate synthetic 200 bp regulatory elements based on chromatin accessibility across different cell types. We evaluate the generated sequences based on key metrics to ensure they retain properties of endogenous sequences: transcription factor binding site composition, potential for cell type-specific chromatin accessibility, and capacity for sequences generated by DNA diffusion to activate gene expression in different cell contexts using state-of-the-art prediction models. Our results demonstrate the ability to robustly generate DNA sequences with cell type-specific regulatory potential. DNA-Diffusion paves the way for revolutionizing a regulatory modulation approach to mammalian synthetic biology and precision gene therapy.
Collapse
Affiliation(s)
- Lucas Ferreira DaSilva
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
| | - Simon Senan
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Zain Munir Patel
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Aniketh Janardhan Reddy
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Sameer Gabbita
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | | | | | | | | | | | - Zelun Li
- Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
| | - Cameron Smith
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | | | - Lithin Karmel Louis
- Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
| | - Paola Cornejo
- Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
| | | | - Emily S. Wong
- Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
| | - Wouter Meuleman
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Luca Pinello
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| |
Collapse
|
47
|
de Almeida BP, Schaub C, Pagani M, Secchia S, Furlong EEM, Stark A. Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo. Nature 2024; 626:207-211. [PMID: 38086418 PMCID: PMC10830412 DOI: 10.1038/s41586-023-06905-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 11/28/2023] [Indexed: 01/19/2024]
Abstract
Enhancers control gene expression and have crucial roles in development and homeostasis1-3. However, the targeted de novo design of enhancers with tissue-specific activities has remained challenging. Here we combine deep learning and transfer learning to design tissue-specific enhancers for five tissues in the Drosophila melanogaster embryo: the central nervous system, epidermis, gut, muscle and brain. We first train convolutional neural networks using genome-wide single-cell assay for transposase-accessible chromatin with sequencing (ATAC-seq) datasets and then fine-tune the convolutional neural networks with smaller-scale data from in vivo enhancer activity assays, yielding models with 13% to 76% positive predictive value according to cross-validation. We designed and experimentally assessed 40 synthetic enhancers (8 per tissue) in vivo, of which 31 (78%) were active and 27 (68%) functioned in the target tissue (100% for central nervous system and muscle). The strategy of combining genome-wide and small-scale functional datasets by transfer learning is generally applicable and should enable the design of tissue-, cell type- and cell state-specific enhancers in any system.
Collapse
Affiliation(s)
- Bernardo P de Almeida
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, Austria
- InstaDeep, Paris, France
| | - Christoph Schaub
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Michaela Pagani
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria
| | - Stefano Secchia
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria.
- Medical University of Vienna, Vienna BioCenter (VBC), Vienna, Austria.
| |
Collapse
|
48
|
Martyn GE, Montgomery MT, Jones H, Guo K, Doughty BR, Linder J, Chen Z, Cochran K, Lawrence KA, Munson G, Pampari A, Fulco CP, Kelley DR, Lander ES, Kundaje A, Engreitz JM. Rewriting regulatory DNA to dissect and reprogram gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.20.572268. [PMID: 38187584 PMCID: PMC10769263 DOI: 10.1101/2023.12.20.572268] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Regulatory DNA sequences within enhancers and promoters bind transcription factors to encode cell type-specific patterns of gene expression. However, the regulatory effects and programmability of such DNA sequences remain difficult to map or predict because we have lacked scalable methods to precisely edit regulatory DNA and quantify the effects in an endogenous genomic context. Here we present an approach to measure the quantitative effects of hundreds of designed DNA sequence variants on gene expression, by combining pooled CRISPR prime editing with RNA fluorescence in situ hybridization and cell sorting (Variant-FlowFISH). We apply this method to mutagenize and rewrite regulatory DNA sequences in an enhancer and the promoter of PPIF in two immune cell lines. Of 672 variant-cell type pairs, we identify 497 that affect PPIF expression. These variants appear to act through a variety of mechanisms including disruption or optimization of existing transcription factor binding sites, as well as creation of de novo sites. Disrupting a single endogenous transcription factor binding site often led to large changes in expression (up to -40% in the enhancer, and -50% in the promoter). The same variant often had different effects across cell types and states, demonstrating a highly tunable regulatory landscape. We use these data to benchmark performance of sequence-based predictive models of gene regulation, and find that certain types of variants are not accurately predicted by existing models. Finally, we computationally design 185 small sequence variants (≤10 bp) and optimize them for specific effects on expression in silico. 84% of these rationally designed edits showed the intended direction of effect, and some had dramatic effects on expression (-100% to +202%). Variant-FlowFISH thus provides a powerful tool to map the effects of variants and transcription factor binding sites on gene expression, test and improve computational models of gene regulation, and reprogram regulatory DNA.
Collapse
Affiliation(s)
- Gabriella E Martyn
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
| | - Michael T Montgomery
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
| | - Hank Jones
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
| | - Katherine Guo
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
| | - Benjamin R Doughty
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Ziwei Chen
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Kelly Cochran
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Kathryn A Lawrence
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Glen Munson
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Anusri Pampari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Charles P Fulco
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Present Address: Sanofi, Cambridge, MA, USA
| | | | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jesse M Engreitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanford Cardiovascular Institute, Stanford University, Stanford, CA, USA
| |
Collapse
|
49
|
Fornes O, Av-Shalom TV, Korecki AJ, Farkas R, Arenillas D, Mathelier A, Simpson E, Wasserman W. OnTarget: in silico design of MiniPromoters for targeted delivery of expression. Nucleic Acids Res 2023; 51:W379-W386. [PMID: 37166953 PMCID: PMC10320062 DOI: 10.1093/nar/gkad375] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 04/24/2023] [Accepted: 04/27/2023] [Indexed: 05/12/2023] Open
Abstract
MiniPromoters, or compact promoters, are short DNA sequences that can drive expression in specific cells and tissues. While broadly useful, they are of high relevance to gene therapy due to their role in enabling precise control of where a therapeutic gene will be expressed. Here, we present OnTarget (http://ontarget.cmmt.ubc.ca), a webserver that streamlines the MiniPromoter design process. Users only need to specify a gene of interest or custom genomic coordinates on which to focus the identification of promoters and enhancers, and can also provide relevant cell-type-specific genomic evidence (e.g. accessible chromatin regions, histone modifications, etc.). OnTarget combines the provided data with internal data to identify candidate promoters and enhancers and design MiniPromoters. To illustrate the utility of OnTarget, we designed and characterized two MiniPromoters targeting different cell populations relevant to Parkinson Disease.
Collapse
Affiliation(s)
- Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, Canada
| | - Tamar V Av-Shalom
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, Canada
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| | - Andrea J Korecki
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, Canada
| | - Rachelle A Farkas
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, Canada
| | - David J Arenillas
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, Canada
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, Oslo, Norway
- Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Elizabeth M Simpson
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, Canada
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, Canada
| |
Collapse
|