1
|
Shen Y, Liu Y, Guo M, Mao S, Chen R, Wang M, Li Z, Li Y, Chen W, Chen F, Wu B, Wang C, Chen W, Cui H, Yuan K, Huang H. DEK-nucleosome structure shows DEK modulates H3K27me3 and stem cell fate. Nat Struct Mol Biol 2025:10.1038/s41594-025-01559-9. [PMID: 40379883 DOI: 10.1038/s41594-025-01559-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 04/11/2025] [Indexed: 05/19/2025]
Abstract
DEK is a highly conserved chromatin-associated oncoprotein that has important roles in regulating chromatin dynamics and stem cell fate. Dysregulation of DEK is associated with stem cell dysfunction and cancers, including acute myeloid leukemia. Despite its importance in chromatin regulation, the structural mechanisms underlying DEK's interaction with chromatin and its influence on gene regulation remain poorly understood. Here we combined cryogenic electron microscopy (cryo-EM), biochemical and cellular approaches to investigate the molecular mechanisms and functional importance of DEK's interaction with chromatin. Our cryo-EM structures reveal the structural basis of the DEK-nucleosome interaction. Biochemical and cellular results demonstrate that this interaction is crucial for DEK deposition onto chromatin. Furthermore, our results reveal that DEK safeguards mouse embryonic stem cells from acquiring primitive endoderm fates by modulating the repressive histone mark H3K27me3. Together, our study provides crucial molecular insights into the structure and function of DEK, establishing a framework for understanding its roles in chromatin biology and cell fate determination.
Collapse
Affiliation(s)
- Yunfan Shen
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China
- Hunan Key Laboratory of Molecular Precision Medicine, Department of Oncology, Xiangya Hospital, Central South University, Changsha, China
| | - Yanhong Liu
- Institute for Biological Electron Microscopy, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Key Laboratory of Molecular Design for Plant Cell Factory of Guangdong Higher Education Institutes, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Maochao Guo
- Institute for Biological Electron Microscopy, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Key Laboratory of Molecular Design for Plant Cell Factory of Guangdong Higher Education Institutes, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Song Mao
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China
- Hunan Key Laboratory of Molecular Precision Medicine, Department of Oncology, Xiangya Hospital, Central South University, Changsha, China
| | - Rui Chen
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Mengran Wang
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Zhengbo Li
- Institute for Biological Electron Microscopy, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Key Laboratory of Molecular Design for Plant Cell Factory of Guangdong Higher Education Institutes, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Yue Li
- Institute for Biological Electron Microscopy, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Key Laboratory of Molecular Design for Plant Cell Factory of Guangdong Higher Education Institutes, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Wan Chen
- Institute for Biological Electron Microscopy, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Key Laboratory of Molecular Design for Plant Cell Factory of Guangdong Higher Education Institutes, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Fang Chen
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China
- Hunan Key Laboratory of Molecular Precision Medicine, Department of Oncology, Xiangya Hospital, Central South University, Changsha, China
| | - Baixing Wu
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China
- Guangdong-Hong Kong Joint Laboratory for RNA Medicine, Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Chongyuan Wang
- Center for Human Tissues and Organs Degeneration, Faculty of Pharmaceutical Sciences, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Wei Chen
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Department of Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Huanhuan Cui
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.
- Department of Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.
| | - Kai Yuan
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China.
- Hunan Key Laboratory of Molecular Precision Medicine, Department of Oncology, Xiangya Hospital, Central South University, Changsha, China.
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China.
| | - Hongda Huang
- Institute for Biological Electron Microscopy, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.
- Key Laboratory of Molecular Design for Plant Cell Factory of Guangdong Higher Education Institutes, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.
| |
Collapse
|
2
|
Dincer TU, Ernst J. ChromActivity: integrative epigenomic and functional characterization assay based annotation of regulatory activity across diverse human cell types. Genome Biol 2025; 26:123. [PMID: 40346707 PMCID: PMC12063466 DOI: 10.1186/s13059-025-03579-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 04/15/2025] [Indexed: 05/11/2025] Open
Abstract
We introduce ChromActivity, a computational framework for predicting and annotating regulatory activity across the genome through integration of multiple epigenomic maps and various functional characterization datasets. ChromActivity generates genomewide predictions of regulatory activity associated with each functional characterization dataset across many cell types based on available epigenomic data. It then for each cell type produces ChromScoreHMM genome annotations based on the combinatorial and spatial patterns within these predictions and ChromScore tracks of overall predicted regulatory activity. ChromActivity provides a resource for analyzing and interpreting the human regulatory genome across diverse cell types.
Collapse
Affiliation(s)
- Tevfik Umut Dincer
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Jason Ernst
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Computer Science Department, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| |
Collapse
|
3
|
Yu T, Cheng L, Khalitov R, Yang Z. A sparse and wide neural network model for DNA sequences. Neural Netw 2025; 184:107040. [PMID: 39709643 DOI: 10.1016/j.neunet.2024.107040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 10/23/2024] [Accepted: 12/07/2024] [Indexed: 12/24/2024]
Abstract
Accurate modeling of DNA sequences requires capturing distant semantic relationships between the nucleotide acid bases. Most existing deep neural network models face two challenges: (1) they are limited to short DNA fragments and cannot capture long-range interactions, and (2) they require many supervised labels, which is often expensive in practice. We propose a new neural network model called SwanDNA to address the above challenges. By using a sparse and wide network architecture, our model enables inferences over very long DNA sequences. By incorporating the neural network into a self-supervised learning framework, our method can give accurate predictions while using less supervised labels. We evaluate SwanDNA in three DNA sequence inference tasks, human variant effect, open chromatin regions detection in plant genes, and GenomicBenchmarks. SwanDNA outperforms all competitors in the first two tasks and achieves state-of-art in seven of eight datasets in GenomicBenchmarks. Our code is available at https://github.com/wiedersehne/SwanDNA.
Collapse
Affiliation(s)
- Tong Yu
- Norwegian University of Science and Technology, Trondheim, Norway.
| | - Lei Cheng
- Norwegian University of Science and Technology, Trondheim, Norway
| | - Ruslan Khalitov
- Norwegian University of Science and Technology, Trondheim, Norway
| | - Zhirong Yang
- Norwegian University of Science and Technology, Trondheim, Norway; Jinhua Institute of Zhejiang University, Hangzhou, China
| |
Collapse
|
4
|
Yu T, Cheng L, Khalitov R, Olsson EB, Yang Z. Self-distillation improves self-supervised learning for DNA sequence inference. Neural Netw 2025; 183:106978. [PMID: 39667220 DOI: 10.1016/j.neunet.2024.106978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 10/28/2024] [Accepted: 11/26/2024] [Indexed: 12/14/2024]
Abstract
Self-supervised Learning (SSL) has been recognized as a method to enhance prediction accuracy in various downstream tasks. However, its efficacy for DNA sequences remains somewhat constrained. This limitation stems primarily from the fact that most existing SSL approaches in genomics focus on masked language modeling of individual sequences, neglecting the crucial aspect of encoding statistics across multiple sequences. To overcome this challenge, we introduce an innovative deep neural network model, which incorporates collaborative learning between a 'student' and a 'teacher' subnetwork. In this model, the student subnetwork employs masked learning on nucleotides and progressively adapts its parameters to the teacher subnetwork through an exponential moving average approach. Concurrently, both subnetworks engage in contrastive learning, deriving insights from two augmented representations of the input sequences. This self-distillation process enables our model to effectively assimilate both contextual information from individual sequences and distributional data across the sequence population. We validated our approach with preliminary pretraining using the human reference genome, followed by applying it to 20 downstream inference tasks. The empirical results from these experiments demonstrate that our novel method significantly boosts inference performance across the majority of these tasks. Our code is available at https://github.com/wiedersehne/FinDNA.
Collapse
Affiliation(s)
- Tong Yu
- Norwegian University of Science and Technology, Trondheim, Norway.
| | - Lei Cheng
- Norwegian University of Science and Technology, Trondheim, Norway
| | - Ruslan Khalitov
- Norwegian University of Science and Technology, Trondheim, Norway
| | - Erland B Olsson
- Norwegian University of Science and Technology, Trondheim, Norway
| | - Zhirong Yang
- Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
5
|
Paul NB, Wolber JC, Sahrhage ML, Beißbarth T, Haubrock M. Prediction of gene expression using histone modification patterns extracted by Particle Swarm Optimization. Bioinformatics 2025; 41:btaf033. [PMID: 39878927 PMCID: PMC11802466 DOI: 10.1093/bioinformatics/btaf033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 11/21/2024] [Accepted: 01/27/2025] [Indexed: 01/31/2025] Open
Abstract
MOTIVATION Histone modifications play an important role in transcription regulation. Although the general importance of some histone modifications for transcription regulation has been previously established, the relevance of others and their interaction is subject to ongoing research. By training Machine Learning models to predict a gene's expression and explaining their decision making process, we can get hints on how histone modifications affect transcription. In previous studies, trained models were either hardly explainable or the models were trained solely on the abundance of histone modifications. Based on other studies, which used histone modification patterns, rather than their abundance, to identify potential regulatory elements, we hypothesize the histone modification pattern in a gene's promoter to be more predictive for gene expression. We used an optimization algorithm to extract predictive histone modification profiles. RESULTS Our algorithm called PatternChrome achieved an average area under curve (AUC) score of 0.9029 over 56 samples for binary classification, outperforming all previous algorithms for the same task. We explained the models decisions to deduce the effect of specific features, certain histone modifications or promoter positions on transcription regulation. Although the predictive histone modification patterns were extracted for each sample separately, they can be used to predict gene expression in other samples, implying that the created patterns are largely generalizable. Interestingly, the impact of histone modifications on gene regulation appears predominantly indifferent to cellular specificity. Through explanation of the classifier's decisions, we substantiate established literature knowledge while concurrently revealing novel insights into the intricate landscape of transcriptional regulation via histone modification. AVAILABILITY AND IMPLEMENTATION The code for the PatternChrome algorithm, the scripts for the analyses and the required data can be found at (https://gitlab.gwdg.de/MedBioinf/generegulation/patternchrome).
Collapse
Affiliation(s)
- Niels Benjamin Paul
- Department of Medical Bioinformatics, University Medical Center Göttingen, Göttingen 37099, Germany
- Clinic of Cardiology and Pneumology, University Medical Center Göttingen, Göttingen 37099, Germany
| | - Jonas Chanrithy Wolber
- Joint Research Center for Computational Biomedicine, RWTH Aachen University, Aachen 52074, Germany
| | - Malte Lennart Sahrhage
- Department of Medical Bioinformatics, University Medical Center Göttingen, Göttingen 37099, Germany
| | - Tim Beißbarth
- Department of Medical Bioinformatics, University Medical Center Göttingen, Göttingen 37099, Germany
| | - Martin Haubrock
- Department of Medical Bioinformatics, University Medical Center Göttingen, Göttingen 37099, Germany
| |
Collapse
|
6
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2025; 68:5-102. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
7
|
Hudaiberdiev S, Ovcharenko I. Functional characteristics and computational model of abundant hyperactive loci in the human genome. eLife 2024; 13:RP95170. [PMID: 39535534 PMCID: PMC11560132 DOI: 10.7554/elife.95170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024] Open
Abstract
Enhancers and promoters are classically considered to be bound by a small set of transcription factors (TFs) in a sequence-specific manner. This assumption has come under increasing skepticism as the datasets of ChIP-seq assays of TFs have expanded. In particular, high-occupancy target (HOT) loci attract hundreds of TFs with often no detectable correlation between ChIP-seq peaks and DNA-binding motif presence. Here, we used a set of 1003 TF ChIP-seq datasets (HepG2, K562, H1) to analyze the patterns of ChIP-seq peak co-occurrence in combination with functional genomics datasets. We identified 43,891 HOT loci forming at the promoter (53%) and enhancer (47%) regions. HOT promoters regulate housekeeping genes, whereas HOT enhancers are involved in tissue-specific process regulation. HOT loci form the foundation of human super-enhancers and evolve under strong negative selection, with some of these loci being located in ultraconserved regions. Sequence-based classification analysis of HOT loci suggested that their formation is driven by the sequence features, and the density of mapped ChIP-seq peaks across TF-bound loci correlates with sequence features and the expression level of flanking genes. Based on the affinities to bind to promoters and enhancers we detected five distinct clusters of TFs that form the core of the HOT loci. We report an abundance of HOT loci in the human genome and a commitment of 51% of all TF ChIP-seq binding events to HOT locus formation thus challenging the classical model of enhancer activity and propose a model of HOT locus formation based on the existence of large transcriptional condensates.
Collapse
Affiliation(s)
- Sanjarbek Hudaiberdiev
- National Institute for Biotechnology and Information, National Library of Medicine, National Institutes of HealthBethesdaUnited States
| | - Ivan Ovcharenko
- National Institute for Biotechnology and Information, National Library of Medicine, National Institutes of HealthBethesdaUnited States
| |
Collapse
|
8
|
Cheng S, Miao B, Li T, Zhao G, Zhang B. Review and Evaluate the Bioinformatics Analysis Strategies of ATAC-seq and CUT&Tag Data. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae054. [PMID: 39255248 PMCID: PMC11464419 DOI: 10.1093/gpbjnl/qzae054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 05/28/2024] [Accepted: 07/18/2024] [Indexed: 09/12/2024]
Abstract
Efficient and reliable profiling methods are essential to study epigenetics. Tn5, one of the first identified prokaryotic transposases with high DNA-binding and tagmentation efficiency, is widely adopted in different genomic and epigenomic protocols for high-throughputly exploring the genome and epigenome. Based on Tn5, the Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) and the Cleavage Under Targets and Tagmentation (CUT&Tag) were developed to measure chromatin accessibility and detect DNA-protein interactions. These methodologies can be applied to large amounts of biological samples with low-input levels, such as rare tissues, embryos, and sorted single cells. However, fast and proper processing of these epigenomic data has become a bottleneck because massive data production continues to increase quickly. Furthermore, inappropriate data analysis can generate biased or misleading conclusions. Therefore, it is essential to evaluate the performance of Tn5-based ATAC-seq and CUT&Tag data processing bioinformatics tools, many of which were developed mostly for analyzing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data. Here, we conducted a comprehensive benchmarking analysis to evaluate the performance of eight popular software for processing ATAC-seq and CUT&Tag data. We compared the sensitivity, specificity, and peak width distribution for both narrow-type and broad-type peak calling. We also tested the influence of the availability of control IgG input in CUT&Tag data analysis. Finally, we evaluated the differential analysis strategies commonly used for analyzing the CUT&Tag data. Our study provided comprehensive guidance for selecting bioinformatics tools and recommended analysis strategies, which were implemented into Docker/Singularity images for streamlined data analysis.
Collapse
Affiliation(s)
- Siyuan Cheng
- Department of Developmental Biology, Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Benpeng Miao
- Department of Developmental Biology, Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO 63108, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Tiandao Li
- Department of Developmental Biology, Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Guoyan Zhao
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
- Department of Neurology, Washington University School of Medicine, St. Louis, MO 63108, USA
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Bo Zhang
- Department of Developmental Biology, Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO 63108, USA
| |
Collapse
|
9
|
Xu L, Liu Y. Identification, Design, and Application of Noncoding Cis-Regulatory Elements. Biomolecules 2024; 14:945. [PMID: 39199333 PMCID: PMC11352686 DOI: 10.3390/biom14080945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 07/25/2024] [Accepted: 07/30/2024] [Indexed: 09/01/2024] Open
Abstract
Cis-regulatory elements (CREs) play a pivotal role in orchestrating interactions with trans-regulatory factors such as transcription factors, RNA-binding proteins, and noncoding RNAs. These interactions are fundamental to the molecular architecture underpinning complex and diverse biological functions in living organisms, facilitating a myriad of sophisticated and dynamic processes. The rapid advancement in the identification and characterization of these regulatory elements has been marked by initiatives such as the Encyclopedia of DNA Elements (ENCODE) project, which represents a significant milestone in the field. Concurrently, the development of CRE detection technologies, exemplified by massively parallel reporter assays, has progressed at an impressive pace, providing powerful tools for CRE discovery. The exponential growth of multimodal functional genomic data has necessitated the application of advanced analytical methods. Deep learning algorithms, particularly large language models, have emerged as invaluable tools for deconstructing the intricate nucleotide sequences governing CRE function. These advancements facilitate precise predictions of CRE activity and enable the de novo design of CREs. A deeper understanding of CRE operational dynamics is crucial for harnessing their versatile regulatory properties. Such insights are instrumental in refining gene therapy techniques, enhancing the efficacy of selective breeding programs, pushing the boundaries of genetic innovation, and opening new possibilities in microbial synthetic biology.
Collapse
Affiliation(s)
- Lingna Xu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China;
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China;
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Foshan 528226, China
| |
Collapse
|
10
|
Hudaiberdiev S, Ovcharenko I. Functional characteristics and computational model of abundant hyperactive loci in the human genome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.05.527203. [PMID: 36945558 PMCID: PMC10028745 DOI: 10.1101/2023.02.05.527203] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Enhancers and promoters are classically considered to be bound by a small set of TFs in a sequence-specific manner. This assumption has come under increasing skepticism as the datasets of ChIP-seq assays of TFs have expanded. In particular, high-occupancy target (HOT) loci attract hundreds of TFs with often no detectable correlation between ChIP-seq peaks and DNA-binding motif presence. Here, we used a set of 1,003 TF ChIP-seq datasets (HepG2, K562, H1) to analyze the patterns of ChIP-seq peak co-occurrence in combination with functional genomics datasets. We identified 43,891 HOT loci forming at the promoter (53%) and enhancer (47%) regions. HOT promoters regulate housekeeping genes, whereas HOT enhancers are involved in tissue-specific process regulation. HOT loci form the foundation of human super-enhancers and evolve under strong negative selection, with some of these loci being located in ultraconserved regions. Sequence-based classification analysis of HOT loci suggested that their formation is driven by the sequence features, and the density of mapped ChIP-seq peaks across TF-bound loci correlates with sequence features and the expression level of flanking genes. Based on the affinities to bind to promoters and enhancers we detected 5 distinct clusters of TFs that form the core of the HOT loci. We report an abundance of HOT loci in the human genome and a commitment of 51% of all TF ChIP-seq binding events to HOT locus formation thus challenging the classical model of enhancer activity and propose a model of HOT locus formation based on the existence of large transcriptional condensates.
Collapse
Affiliation(s)
- Sanjarbek Hudaiberdiev
- National Institute for Biotechnology and Information, National Library of Medicine, National Institutes of Health. Bethesda, MD
| | - Ivan Ovcharenko
- National Institute for Biotechnology and Information, National Library of Medicine, National Institutes of Health. Bethesda, MD
| |
Collapse
|
11
|
Abnizova I, Stapel C, Boekhorst RT, Lee JTH, Hemberg M. Integrative analysis of transcriptomic and epigenomic data reveals distinct patterns for developmental and housekeeping gene regulation. BMC Biol 2024; 22:78. [PMID: 38600550 PMCID: PMC11005181 DOI: 10.1186/s12915-024-01869-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 03/14/2024] [Indexed: 04/12/2024] Open
Abstract
BACKGROUND Regulation of transcription is central to the emergence of new cell types during development, and it often involves activation of genes via proximal and distal regulatory regions. The activity of regulatory elements is determined by transcription factors (TFs) and epigenetic marks, but despite extensive mapping of such patterns, the extraction of regulatory principles remains challenging. RESULTS Here we study differentially and similarly expressed genes along with their associated epigenomic profiles, chromatin accessibility and DNA methylation, during lineage specification at gastrulation in mice. Comparison of the three lineages allows us to identify genomic and epigenomic features that distinguish the two classes of genes. We show that differentially expressed genes are primarily regulated by distal elements, while similarly expressed genes are controlled by proximal housekeeping regulatory programs. Differentially expressed genes are relatively isolated within topologically associated domains, while similarly expressed genes tend to be located in gene clusters. Transcription of differentially expressed genes is associated with differentially open chromatin at distal elements including enhancers, while that of similarly expressed genes is associated with ubiquitously accessible chromatin at promoters. CONCLUSION Based on these associations of (linearly) distal genes' transcription start sites (TSSs) and putative enhancers for developmental genes, our findings allow us to link putative enhancers to their target promoters and to infer lineage-specific repertoires of putative driver transcription factors, within which we define subgroups of pioneers and co-operators.
Collapse
Affiliation(s)
- Irina Abnizova
- Epigenetics Programme, Babraham Institute, Cambridge, UK
- Wellcome Sanger Institute, Hinxton, UK
| | - Carine Stapel
- Epigenetics Programme, Babraham Institute, Cambridge, UK
| | | | | | - Martin Hemberg
- Wellcome Sanger Institute, Hinxton, UK.
- The Gene Lay Institute of Immunology and Inflammation Brigham & Women's Hospital and Harvard Medical School, Boston, USA.
| |
Collapse
|
12
|
Camellato BR, Brosh R, Ashe HJ, Maurano MT, Boeke JD. Synthetic reversed sequences reveal default genomic states. Nature 2024; 628:373-380. [PMID: 38448583 PMCID: PMC11006607 DOI: 10.1038/s41586-024-07128-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 01/29/2024] [Indexed: 03/08/2024]
Abstract
Pervasive transcriptional activity is observed across diverse species. The genomes of extant organisms have undergone billions of years of evolution, making it unclear whether these genomic activities represent effects of selection or 'noise'1-4. Characterizing default genome states could help understand whether pervasive transcriptional activity has biological meaning. Here we addressed this question by introducing a synthetic 101-kb locus into the genomes of Saccharomyces cerevisiae and Mus musculus and characterizing genomic activity. The locus was designed by reversing but not complementing human HPRT1, including its flanking regions, thus retaining basic features of the natural sequence but ablating evolved coding or regulatory information. We observed widespread activity of both reversed and native HPRT1 loci in yeast, despite the lack of evolved yeast promoters. By contrast, the reversed locus displayed no activity at all in mouse embryonic stem cells, and instead exhibited repressive chromatin signatures. The repressive signature was alleviated in a locus variant lacking CpG dinucleotides; nevertheless, this variant was also transcriptionally inactive. These results show that synthetic genomic sequences that lack coding information are active in yeast, but inactive in mouse embryonic stem cells, consistent with a major difference in 'default genomic states' between these two divergent eukaryotic cell types, with implications for understanding pervasive transcription, horizontal transfer of genetic information and the birth of new genes.
Collapse
Affiliation(s)
| | - Ran Brosh
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA
| | - Hannah J Ashe
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA
| | - Matthew T Maurano
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA
- Department of Pathology, NYU Langone Health, New York, NY, USA
| | - Jef D Boeke
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA.
- Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY, USA.
- Department of Biomedical Engineering, NYU Tandon School of Engineering, New York, NY, USA.
| |
Collapse
|
13
|
Huang L, Zhang J, Songyang Z, Xiong Y. Identification and Validation of eRNA as a Prognostic Indicator for Cervical Cancer. BIOLOGY 2024; 13:227. [PMID: 38666838 PMCID: PMC11048606 DOI: 10.3390/biology13040227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/22/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024]
Abstract
The survival of CESC patients is closely related to the expression of enhancer RNA (eRNA). In this work, we downloaded eRNA expression, clinical, and gene expression data from the TCeA and TCGA portals. A total of 7936 differentially expressed eRNAs were discovered by limma analysis, and the relationship between these eRNAs and survival was analyzed by univariate Cox hazard analysis, LASSO regression, and multivariate Cox hazard analysis to obtain an 8-eRNA model. Risk score heat maps, KM curves, ROC analysis, robustness analysis, and nomograms further indicate that this 8-eRNA model is a novel indicator with high prognostic performance independent of clinicopathological classification. The model divided patients into high-risk and low-risk groups, compared pathway diversity between the two groups through GSEA analysis, and provided potential therapeutic agents for high-risk patients.
Collapse
Affiliation(s)
- Lijing Huang
- MOE Key Laboratory of Gene Function and Regulation, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China; (L.H.); (J.Z.)
| | - Jingkai Zhang
- MOE Key Laboratory of Gene Function and Regulation, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China; (L.H.); (J.Z.)
| | - Zhou Songyang
- MOE Key Laboratory of Gene Function and Regulation, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China; (L.H.); (J.Z.)
- Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 510060, China
| | - Yuanyan Xiong
- MOE Key Laboratory of Gene Function and Regulation, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China; (L.H.); (J.Z.)
| |
Collapse
|
14
|
Gaynor-Gillett SC, Cheng L, Shi M, Liu J, Wang G, Spector M, Flaherty M, Wall M, Hwang A, Gu M, Chen Z, Chen Y, Consortium P, Moran JR, Zhang J, Lee D, Gerstein M, Geschwind D, White KP. Validation of Enhancer Regions in Primary Human Neural Progenitor Cells using Capture STARR-seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.14.585066. [PMID: 38562832 PMCID: PMC10983874 DOI: 10.1101/2024.03.14.585066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Genome-wide association studies (GWAS) and expression analyses implicate noncoding regulatory regions as harboring risk factors for psychiatric disease, but functional characterization of these regions remains limited. We performed capture STARR-sequencing of over 78,000 candidate regions to identify active enhancers in primary human neural progenitor cells (phNPCs). We selected candidate regions by integrating data from NPCs, prefrontal cortex, developmental timepoints, and GWAS. Over 8,000 regions demonstrated enhancer activity in the phNPCs, and we linked these regions to over 2,200 predicted target genes. These genes are involved in neuronal and psychiatric disease-associated pathways, including dopaminergic synapse, axon guidance, and schizophrenia. We functionally validated a subset of these enhancers using mutation STARR-sequencing and CRISPR deletions, demonstrating the effects of genetic variation on enhancer activity and enhancer deletion on gene expression. Overall, we identified thousands of highly active enhancers and functionally validated a subset of these enhancers, improving our understanding of regulatory networks underlying brain function and disease.
Collapse
Affiliation(s)
- Sophia C. Gaynor-Gillett
- Tempus Labs, Inc.; Chicago, IL, 60654, USA
- Department of Biology, Cornell College; Mount Vernon, IA, 52314, USA
| | | | - Manman Shi
- Tempus Labs, Inc.; Chicago, IL, 60654, USA
| | - Jason Liu
- Computational Biology and Bioinformatics Program, Yale University; New Haven, CT, 06511, USA
| | - Gaoyuan Wang
- Computational Biology and Bioinformatics Program, Yale University; New Haven, CT, 06511, USA
| | | | | | | | - Ahyeon Hwang
- Department of Computer Science, University of California Irvine; Irvine, CA, 92697, USA
| | - Mengting Gu
- Computational Biology and Bioinformatics Program, Yale University; New Haven, CT, 06511, USA
| | - Zhanlin Chen
- Computational Biology and Bioinformatics Program, Yale University; New Haven, CT, 06511, USA
| | - Yuhang Chen
- Computational Biology and Bioinformatics Program, Yale University; New Haven, CT, 06511, USA
| | | | | | - Jing Zhang
- Department of Computer Science, University of California Irvine; Irvine, CA, 92697, USA
| | - Donghoon Lee
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai; New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai; New York, NY, 10029, USA
| | - Mark Gerstein
- Computational Biology and Bioinformatics Program, Yale University; New Haven, CT, 06511, USA
- Department of Statistics and Data Science, Yale University; New Haven, CT, 06511, USA
- Department of Molecular Biophysics and Biochemistry, Yale University; New Haven, CT, 06511, USA
- Department of Computer Science, Yale University; New Haven, CT, 06511, USA
| | - Daniel Geschwind
- Department of Neurology, David Geffen School of Medicine, University of California Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry and Semel Institute, David Geffen School of Medicine, University of California Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles; Los Angeles, CA, 90095, USA
| | - Kevin P. White
- Yong Loo Lin School of Medicine, National University of Singapore; Singapore, 117597
| |
Collapse
|
15
|
Cheng L, Yu T, Khalitov R, Yang Z. Self-supervised Learning for DNA sequences with circular dilated convolutional networks. Neural Netw 2024; 171:466-473. [PMID: 38150872 DOI: 10.1016/j.neunet.2023.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 10/11/2023] [Accepted: 12/01/2023] [Indexed: 12/29/2023]
Abstract
DNA molecules commonly exhibit wide interactions between the nucleobases. Modeling the interactions is important for obtaining accurate sequence-based inference. Although many deep learning methods have recently been developed for modeling DNA sequences, they still suffer from two major issues: 1) most existing methods can handle only short DNA fragments and fail to capture long-range information; 2) current methods always require massive supervised labels, which are hard to obtain in practice. We propose a new method to address both issues. Our neural network employs circular dilated convolutions as building blocks in the backbone. As a result, our network can take long DNA sequences as input without any condensation. We also incorporate the neural network into a self-supervised learning framework to capture inherent information in DNA without expensive supervised labeling. We have tested our model in two DNA inference tasks, the human variant effect and the open chromatin region of plants, where the experimental results show that our method outperforms five other deep learning models. Our code is available at https://github.com/wiedersehne/cdilDNA.
Collapse
Affiliation(s)
- Lei Cheng
- Department of Computer Science, Norwegian University of Science and Technology, Norway
| | - Tong Yu
- Department of Computer Science, Norwegian University of Science and Technology, Norway
| | - Ruslan Khalitov
- Department of Computer Science, Norwegian University of Science and Technology, Norway
| | - Zhirong Yang
- Department of Computer Science, Norwegian University of Science and Technology, Norway; Jinhua Institute of Zhejiang Univerisity, China.
| |
Collapse
|
16
|
Arthur TD, Nguyen JP, D'Antonio-Chronowska A, Matsui H, Silva NS, Joshua IN, Luchessi AD, Greenwald WWY, D'Antonio M, Pera MF, Frazer KA. Complex regulatory networks influence pluripotent cell state transitions in human iPSCs. Nat Commun 2024; 15:1664. [PMID: 38395976 PMCID: PMC10891157 DOI: 10.1038/s41467-024-45506-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 01/26/2024] [Indexed: 02/25/2024] Open
Abstract
Stem cells exist in vitro in a spectrum of interconvertible pluripotent states. Analyzing hundreds of hiPSCs derived from different individuals, we show the proportions of these pluripotent states vary considerably across lines. We discover 13 gene network modules (GNMs) and 13 regulatory network modules (RNMs), which are highly correlated with each other suggesting that the coordinated co-accessibility of regulatory elements in the RNMs likely underlie the coordinated expression of genes in the GNMs. Epigenetic analyses reveal that regulatory networks underlying self-renewal and pluripotency are more complex than previously realized. Genetic analyses identify thousands of regulatory variants that overlapped predicted transcription factor binding sites and are associated with chromatin accessibility in the hiPSCs. We show that the master regulator of pluripotency, the NANOG-OCT4 Complex, and its associated network are significantly enriched for regulatory variants with large effects, suggesting that they play a role in the varying cellular proportions of pluripotency states between hiPSCs. Our work bins tens of thousands of regulatory elements in hiPSCs into discrete regulatory networks, shows that pluripotency and self-renewal processes have a surprising level of regulatory complexity, and suggests that genetic factors may contribute to cell state transitions in human iPSC lines.
Collapse
Affiliation(s)
- Timothy D Arthur
- Biomedical Sciences Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Jennifer P Nguyen
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
| | | | - Hiroko Matsui
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Nayara S Silva
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Isaac N Joshua
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - André D Luchessi
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal, Brazil
- Department of Clinical and Toxicological Analysis, Federal University of Rio Grande do Norte, Natal, Brazil
| | - William W Young Greenwald
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Matteo D'Antonio
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | | | - Kelly A Frazer
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA.
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA.
| |
Collapse
|
17
|
Liu X, Gillis N, Jiang C, McCofie A, Shaw TI, Tan AC, Zhao B, Wan L, Duckett DR, Teng M. An Epigenomic fingerprint of human cancers by landscape interrogation of super enhancers at the constituent level. PLoS Comput Biol 2024; 20:e1011873. [PMID: 38335222 PMCID: PMC10883583 DOI: 10.1371/journal.pcbi.1011873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 02/22/2024] [Accepted: 01/30/2024] [Indexed: 02/12/2024] Open
Abstract
Super enhancers (SE), large genomic elements that activate transcription and drive cell identity, have been found with cancer-specific gene regulation in human cancers. Recent studies reported the importance of understanding the cooperation and function of SE internal components, i.e., the constituent enhancers (CE). However, there are no pan-cancer studies to identify cancer-specific SE signatures at the constituent level. Here, by revisiting pan-cancer SE activities with H3K27Ac ChIP-seq datasets, we report fingerprint SE signatures for 28 cancer types in the NCI-60 cell panel. We implement a mixture model to discriminate active CEs from inactive CEs by taking into consideration ChIP-seq variabilities between cancer samples and across CEs. We demonstrate that the model-based estimation of CE states provides improved functional interpretation of SE-associated regulation. We identify cancer-specific CEs by balancing their active prevalence with their capability of encoding cancer type identities. We further demonstrate that cancer-specific CEs have the strongest per-base enhancer activities in independent enhancer sequencing assays, suggesting their importance in understanding critical SE signatures. We summarize fingerprint SEs based on the cancer-specific statuses of their component CEs and build an easy-to-use R package to facilitate the query, exploration, and visualization of fingerprint SEs across cancers.
Collapse
Affiliation(s)
- Xiang Liu
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida, United States of America
| | - Nancy Gillis
- Department of Cancer Epidemiology, Moffitt Cancer Center, Tampa, Florida, United States of America
| | - Chang Jiang
- Department of Molecular Oncology, Moffitt Cancer Center, Tampa, Florida, United States of America
| | - Anthony McCofie
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida, United States of America
| | - Timothy I Shaw
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida, United States of America
| | - Aik-Choon Tan
- Department of Oncological Sciences, Huntsman Cancer Institute, The University of Utah, Salt Lake City, Utah, United States of America
| | - Bo Zhao
- Division of Infectious Disease, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Lixin Wan
- Department of Molecular Oncology, Moffitt Cancer Center, Tampa, Florida, United States of America
| | - Derek R Duckett
- Department of Drug Discovery, Moffitt Cancer Center, Tampa, Florida, United States of America
| | - Mingxiang Teng
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida, United States of America
| |
Collapse
|
18
|
Wang Y, Jin W, Pan X, Liao W, Shen Q, Cai J, Gong W, Tian Y, Xu D, Li Y, Li J, Gong J, Zhang Z, Yuan X. Pig-eRNAdb: a comprehensive enhancer and eRNA dataset of pigs. Sci Data 2024; 11:157. [PMID: 38302497 PMCID: PMC10834423 DOI: 10.1038/s41597-024-02960-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 01/11/2024] [Indexed: 02/03/2024] Open
Abstract
Enhancers and the enhancer RNAs (eRNAs) have been strongly implicated in regulations of transcriptions. Based the multi-omics data (ATAC-seq, ChIP-seq and RNA-seq) from public databases, Pig-eRNAdb is a dataset that comprehensively integrates enhancers and eRNAs for pigs using the machine learning strategy, which incorporates 82,399 enhancers and 37,803 eRNAs from 607 samples across 15 tissues of pigs. This user-friendly dataset covers a comprehensive depth of enhancers and eRNAs annotation for pigs. The coordinates of enhancers and the expression patterns of eRNAs are downloadable. Besides, thousands of regulators on eRNAs, the target genes of eRNAs, the tissue-specific eRNAs, and the housekeeping eRNAs are also accessible as well as the sequence similarity of eRNAs with humans. Moreover, the tissue-specific eRNA-trait associations encompass 652 traits are also provided. It will crucially facilitate investigations on enhancers and eRNAs with Pig-eRNAdb as a reference dataset in pigs.
Collapse
Affiliation(s)
- Yifei Wang
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Weiwei Jin
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xiangchun Pan
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Weili Liao
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Qingpeng Shen
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Jiali Cai
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Wentao Gong
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Yuhan Tian
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Dantong Xu
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Yipeng Li
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Jiaqi Li
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Jing Gong
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Zhe Zhang
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China.
| | - Xiaolong Yuan
- Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science, South China Agricultural University, Guangzhou, 510642, China.
| |
Collapse
|
19
|
Mancheno-Ferris A, Immarigeon C, Rivero A, Depierre D, Schickele N, Fosseprez O, Chanard N, Aughey G, Lhoumaud P, Anglade J, Southall T, Plaza S, Payre F, Cuvier O, Polesello C. Crosstalk between chromatin and Shavenbaby defines transcriptional output along the Drosophila intestinal stem cell lineage. iScience 2024; 27:108624. [PMID: 38174321 PMCID: PMC10762455 DOI: 10.1016/j.isci.2023.108624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 07/05/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024] Open
Abstract
The transcription factor Shavenbaby (Svb), the only member of the OvoL family in Drosophila, controls the fate of various epithelial embryonic cells and adult stem cells. Post-translational modification of Svb produces two protein isoforms, Svb-ACT and Svb-REP, which promote adult intestinal stem cell renewal or differentiation, respectively. To define Svb mode of action, we used engineered cell lines and develop an unbiased method to identify Svb target genes across different contexts. Within a given cell type, Svb-ACT and Svb-REP antagonistically regulate the expression of a set of target genes, binding specific enhancers whose accessibility is constrained by chromatin landscape. Reciprocally, Svb-REP can influence local chromatin marks of active enhancers to help repressing target genes. Along the intestinal lineage, the set of Svb target genes progressively changes, together with chromatin accessibility. We propose that Svb-ACT-to-REP transition promotes enterocyte differentiation of intestinal stem cells through direct gene regulation and chromatin remodeling.
Collapse
Affiliation(s)
- Alexandra Mancheno-Ferris
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Clément Immarigeon
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Alexia Rivero
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - David Depierre
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Naomi Schickele
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Olivier Fosseprez
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Nicolas Chanard
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Gabriel Aughey
- Imperial College London, Sir Ernst Chain Building, South Kensington Campus, London SW7 2AZ, UK
| | - Priscilla Lhoumaud
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
- Institut Jacques Monod, Université Paris Cité/CNRS, 15 rue Hélène Brion, 75205 Paris Cedex 13, France
| | - Julien Anglade
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Tony Southall
- Imperial College London, Sir Ernst Chain Building, South Kensington Campus, London SW7 2AZ, UK
| | - Serge Plaza
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Laboratoire de Recherche en Sciences Végétales, CNRS/UPS/INPT, 31320 Auzeville-Tolosane, France
| | - François Payre
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Olivier Cuvier
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Cédric Polesello
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| |
Collapse
|
20
|
Mulet-Lazaro R, Delwel R. From Genotype to Phenotype: How Enhancers Control Gene Expression and Cell Identity in Hematopoiesis. Hemasphere 2023; 7:e969. [PMID: 37953829 PMCID: PMC10635615 DOI: 10.1097/hs9.0000000000000969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/11/2023] [Indexed: 11/14/2023] Open
Abstract
Blood comprises a wide array of specialized cells, all of which share the same genetic information and ultimately derive from the same precursor, the hematopoietic stem cell (HSC). This diversity of phenotypes is underpinned by unique transcriptional programs gradually acquired in the process known as hematopoiesis. Spatiotemporal regulation of gene expression depends on many factors, but critical among them are enhancers-sequences of DNA that bind transcription factors and increase transcription of genes under their control. Thus, hematopoiesis involves the activation of specific enhancer repertoires in HSCs and their progeny, driving the expression of sets of genes that collectively determine morphology and function. Disruption of this tightly regulated process can have catastrophic consequences: in hematopoietic malignancies, dysregulation of transcriptional control by enhancers leads to misexpression of oncogenes that ultimately drive transformation. This review attempts to provide a basic understanding of enhancers and their role in transcriptional regulation, with a focus on normal and malignant hematopoiesis. We present examples of enhancers controlling master regulators of hematopoiesis and discuss the main mechanisms leading to enhancer dysregulation in leukemia and lymphoma.
Collapse
Affiliation(s)
- Roger Mulet-Lazaro
- Department of Hematology, Erasmus MC Cancer Institute, Rotterdam, the Netherlands
- Oncode Institute, Utrecht, the Netherlands
| | - Ruud Delwel
- Department of Hematology, Erasmus MC Cancer Institute, Rotterdam, the Netherlands
- Oncode Institute, Utrecht, the Netherlands
| |
Collapse
|
21
|
Arnold M, Stengel KR. Emerging insights into enhancer biology and function. Transcription 2023; 14:68-87. [PMID: 37312570 PMCID: PMC10353330 DOI: 10.1080/21541264.2023.2222032] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 05/30/2023] [Accepted: 06/01/2023] [Indexed: 06/15/2023] Open
Abstract
Cell type-specific gene expression is coordinated by DNA-encoded enhancers and the transcription factors (TFs) that bind to them in a sequence-specific manner. As such, these enhancers and TFs are critical mediators of normal development and altered enhancer or TF function is associated with the development of diseases such as cancer. While initially defined by their ability to activate gene transcription in reporter assays, putative enhancer elements are now frequently defined by their unique chromatin features including DNase hypersensitivity and transposase accessibility, bidirectional enhancer RNA (eRNA) transcription, CpG hypomethylation, high H3K27ac and H3K4me1, sequence-specific transcription factor binding, and co-factor recruitment. Identification of these chromatin features through sequencing-based assays has revolutionized our ability to identify enhancer elements on a genome-wide scale, and genome-wide functional assays are now capitalizing on this information to greatly expand our understanding of how enhancers function to provide spatiotemporal coordination of gene expression programs. Here, we highlight recent technological advances that are providing new insights into the molecular mechanisms by which these critical cis-regulatory elements function in gene control. We pay particular attention to advances in our understanding of enhancer transcription, enhancer-promoter syntax, 3D organization and biomolecular condensates, transcription factor and co-factor dependencies, and the development of genome-wide functional enhancer screens.
Collapse
Affiliation(s)
- Mirjam Arnold
- Department of Cell Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Kristy R. Stengel
- Department of Cell Biology, Albert Einstein College of Medicine, Bronx, NY, USA
- Montefiore Einstein Cancer Center, Albert Einstein College of Medicine-Montefiore Health System, Bronx, NY, USA
- Ruth L. and David S. Gottesman Institute for Stem Cell and Regenerative Medicine Research, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
22
|
Xu D, Forbes AN, Cohen S, Palladino A, Karadimitriou T, Khurana E. Recapitulation of patient-specific 3D chromatin conformation using machine learning. CELL REPORTS METHODS 2023; 3:100578. [PMID: 37673071 PMCID: PMC10545938 DOI: 10.1016/j.crmeth.2023.100578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 04/05/2023] [Accepted: 08/10/2023] [Indexed: 09/08/2023]
Abstract
Regulatory networks containing enhancer-gene edges define cellular states. Multiple efforts have revealed these networks for reference tissues and cell lines by integrating multi-omics data. However, the methods developed cannot be applied for large patient cohorts due to the infeasibility of chromatin immunoprecipitation sequencing (ChIP-seq) for limited biopsy material. We trained machine-learning models using chromatin interaction analysis with paired-end tag sequencing (ChIA-PET) and high-throughput chromosome conformation capture combined with chromatin immunoprecipitation (HiChIP) data that can predict connections using only assay for transposase-accessible chromatin using sequencing (ATAC-seq) and RNA-seq data as input, which can be generated from biopsies. Our method overcomes limitations of correlation-based approaches that cannot distinguish between distinct target genes of given enhancers or between active vs. poised states in different samples, a hallmark of network rewiring in cancer. Application of our model on 371 samples across 22 cancer types revealed 1,780 enhancer-gene connections for 602 cancer genes. Using CRISPR interference (CRISPRi), we validated enhancers predicted to regulate ESR1 in estrogen receptor (ER)+ breast cancer and A1CF in liver hepatocellular carcinoma.
Collapse
Affiliation(s)
- Duo Xu
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA; Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY, USA; Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Andre Neil Forbes
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA; Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY, USA; Weill Cornell Graduate School of Medical Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Sandra Cohen
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Ann Palladino
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA; Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY, USA
| | | | - Ekta Khurana
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA; Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY, USA; Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
23
|
Arthur TD, Nguyen JP, D'Antonio-Chronowska A, Matsui H, Silva NS, Joshua IN, Luchessi AD, Young Greenwald WW, D'Antonio M, Pera MF, Frazer KA. Analysis of regulatory network modules in hundreds of human stem cell lines reveals complex epigenetic and genetic factors contribute to pluripotency state differences between subpopulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.20.541447. [PMID: 37292794 PMCID: PMC10245835 DOI: 10.1101/2023.05.20.541447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Stem cells exist in vitro in a spectrum of interconvertible pluripotent states. Analyzing hundreds of hiPSCs derived from different individuals, we show the proportions of these pluripotent states vary considerably across lines. We discovered 13 gene network modules (GNMs) and 13 regulatory network modules (RNMs), which were highly correlated with each other suggesting that the coordinated co-accessibility of regulatory elements in the RNMs likely underlied the coordinated expression of genes in the GNMs. Epigenetic analyses revealed that regulatory networks underlying self-renewal and pluripotency have a surprising level of complexity. Genetic analyses identified thousands of regulatory variants that overlapped predicted transcription factor binding sites and were associated with chromatin accessibility in the hiPSCs. We show that the master regulator of pluripotency, the NANOG-OCT4 Complex, and its associated network were significantly enriched for regulatory variants with large effects, suggesting that they may play a role in the varying cellular proportions of pluripotency states between hiPSCs. Our work captures the coordinated activity of tens of thousands of regulatory elements in hiPSCs and bins these elements into discrete functionally characterized regulatory networks, shows that regulatory elements in pluripotency networks harbor variants with large effects, and provides a rich resource for future pluripotent stem cell research.
Collapse
|
24
|
Wang J, Zhang H, Chen N, Zeng T, Ai X, Wu K. PorcineAI-Enhancer: Prediction of Pig Enhancer Sequences Using Convolutional Neural Networks. Animals (Basel) 2023; 13:2935. [PMID: 37760334 PMCID: PMC10526013 DOI: 10.3390/ani13182935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/21/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open
Abstract
Understanding the mechanisms of gene expression regulation is crucial in animal breeding. Cis-regulatory DNA sequences, such as enhancers, play a key role in regulating gene expression. Identifying enhancers is challenging, despite the use of experimental techniques and computational methods. Enhancer prediction in the pig genome is particularly significant due to the costliness of high-throughput experimental techniques. The study constructed a high-quality database of pig enhancers by integrating information from multiple sources. A deep learning prediction framework called PorcineAI-enhancer was developed for the prediction of pig enhancers. This framework employs convolutional neural networks for feature extraction and classification. PorcineAI-enhancer showed excellent performance in predicting pig enhancers, validated on an independent test dataset. The model demonstrated reliable prediction capability for unknown enhancer sequences and performed remarkably well on tissue-specific enhancer sequences.The study developed a deep learning prediction framework, PorcineAI-enhancer, for predicting pig enhancers. The model demonstrated significant predictive performance and potential for tissue-specific enhancers. This research provides valuable resources for future studies on gene expression regulation in pigs.
Collapse
Affiliation(s)
- Ji Wang
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| | - Han Zhang
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| | - Nanzhu Chen
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China;
| | - Tong Zeng
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| | - Xiaohua Ai
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| | - Keliang Wu
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| |
Collapse
|
25
|
Mikaeili H, Habib AM, Yeung CWL, Santana-Varela S, Luiz AP, Panteleeva K, Zuberi S, Athanasiou-Fragkouli A, Houlden H, Wood JN, Okorokov AL, Cox JJ. Molecular basis of FAAH-OUT-associated human pain insensitivity. Brain 2023; 146:3851-3865. [PMID: 37222214 PMCID: PMC10473560 DOI: 10.1093/brain/awad098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 03/03/2023] [Accepted: 03/10/2023] [Indexed: 05/25/2023] Open
Abstract
Chronic pain affects millions of people worldwide and new treatments are needed urgently. One way to identify novel analgesic strategies is to understand the biological dysfunctions that lead to human inherited pain insensitivity disorders. Here we report how the recently discovered brain and dorsal root ganglia-expressed FAAH-OUT long non-coding RNA (lncRNA) gene, which was found from studying a pain-insensitive patient with reduced anxiety and fast wound healing, regulates the adjacent key endocannabinoid system gene FAAH, which encodes the anandamide-degrading fatty acid amide hydrolase enzyme. We demonstrate that the disruption in FAAH-OUT lncRNA transcription leads to DNMT1-dependent DNA methylation within the FAAH promoter. In addition, FAAH-OUT contains a conserved regulatory element, FAAH-AMP, that acts as an enhancer for FAAH expression. Furthermore, using transcriptomic analyses in patient-derived cells we have uncovered a network of genes that are dysregulated from disruption of the FAAH-FAAH-OUT axis, thus providing a coherent mechanistic basis to understand the human phenotype observed. Given that FAAH is a potential target for the treatment of pain, anxiety, depression and other neurological disorders, this new understanding of the regulatory role of the FAAH-OUT gene provides a platform for the development of future gene and small molecule therapies.
Collapse
Affiliation(s)
- Hajar Mikaeili
- Molecular Nociception Group, Wolfson Institute for Biomedical Research, University College London, London WC1E 6BT, UK
| | - Abdella M Habib
- College of Medicine, QU Health, Qatar University, Doha, Qatar
| | - Charlix Wai-Lok Yeung
- Molecular Nociception Group, Wolfson Institute for Biomedical Research, University College London, London WC1E 6BT, UK
| | - Sonia Santana-Varela
- Molecular Nociception Group, Wolfson Institute for Biomedical Research, University College London, London WC1E 6BT, UK
| | - Ana P Luiz
- Molecular Nociception Group, Wolfson Institute for Biomedical Research, University College London, London WC1E 6BT, UK
| | - Kseniia Panteleeva
- Molecular Nociception Group, Wolfson Institute for Biomedical Research, University College London, London WC1E 6BT, UK
| | - Sana Zuberi
- Molecular Nociception Group, Wolfson Institute for Biomedical Research, University College London, London WC1E 6BT, UK
| | | | - Henry Houlden
- Department of Molecular Neuroscience, Institute of Neurology, University College London, London WC1N 3BG, UK
| | - John N Wood
- Molecular Nociception Group, Wolfson Institute for Biomedical Research, University College London, London WC1E 6BT, UK
| | - Andrei L Okorokov
- Molecular Nociception Group, Wolfson Institute for Biomedical Research, University College London, London WC1E 6BT, UK
| | - James J Cox
- Molecular Nociception Group, Wolfson Institute for Biomedical Research, University College London, London WC1E 6BT, UK
| |
Collapse
|
26
|
Nowling RJ, Njoya K, Peters JG, Riehle MM. Prediction accuracy of regulatory elements from sequence varies by functional sequencing technique. Front Cell Infect Microbiol 2023; 13:1182567. [PMID: 37600946 PMCID: PMC10433755 DOI: 10.3389/fcimb.2023.1182567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 07/10/2023] [Indexed: 08/22/2023] Open
Abstract
Introduction Various sequencing based approaches are used to identify and characterize the activities of cis-regulatory elements in a genome-wide fashion. Some of these techniques rely on indirect markers such as histone modifications (ChIP-seq with histone antibodies) or chromatin accessibility (ATAC-seq, DNase-seq, FAIRE-seq), while other techniques use direct measures such as episomal assays measuring the enhancer properties of DNA sequences (STARR-seq) and direct measurement of the binding of transcription factors (ChIP-seq with transcription factor-specific antibodies). The activities of cis-regulatory elements such as enhancers, promoters, and repressors are determined by their sequence and secondary processes such as chromatin accessibility, DNA methylation, and bound histone markers. Methods Here, machine learning models are employed to evaluate the accuracy with which cis-regulatory elements identified by various commonly used sequencing techniques can be predicted by their underlying sequence alone to distinguish between cis-regulatory activity that is reflective of sequence content versus secondary processes. Results and discussion Models trained and evaluated on D. melanogaster sequences identified through DNase-seq and STARR-seq are significantly more accurate than models trained on sequences identified by H3K4me1, H3K4me3, and H3K27ac ChIP-seq, FAIRE-seq, and ATAC-seq. These results suggest that the activity detected by DNase-seq and STARR-seq can be largely explained by underlying DNA sequence, independent of secondary processes. Experimentally, a subset of DNase-seq and H3K4me1 ChIP-seq sequences were tested for enhancer activity using luciferase assays and compared with previous tests performed on STARR-seq sequences. The experimental data indicated that STARR-seq sequences are substantially enriched for enhancer-specific activity, while the DNase-seq and H3K4me1 ChIP-seq sequences are not. Taken together, these results indicate that the DNase-seq approach identifies a broad class of regulatory elements of which enhancers are a subset and the associated data are appropriate for training models for detecting regulatory activity from sequence alone, STARR-seq data are best for training enhancer-specific sequence models, and H3K4me1 ChIP-seq data are not well suited for training and evaluating sequence-based models for cis-regulatory element prediction.
Collapse
Affiliation(s)
- Ronald J. Nowling
- Electrical Engineering and Computer Science, Milwaukee School of Engineering, Milwaukee, WI, United States
| | - Kimani Njoya
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| | - John G. Peters
- Electrical Engineering and Computer Science, Milwaukee School of Engineering, Milwaukee, WI, United States
| | - Michelle M. Riehle
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| |
Collapse
|
27
|
Dincer TU, Ernst J. Integrative epigenomic and functional characterization assay based annotation of regulatory activity across diverse human cell types. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.14.549056. [PMID: 37503240 PMCID: PMC10369970 DOI: 10.1101/2023.07.14.549056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
We introduce ChromActivity, a computational framework for predicting and annotating regulatory activity across the genome through integration of multiple epigenomic maps and various functional characterization datasets. ChromActivity generates genomewide predictions of regulatory activity associated with each functional characterization dataset across many cell types based on available epigenomic data. It then for each cell type produces (1) ChromScoreHMM genome annotations based on the combinatorial and spatial patterns within these predictions and (2) ChromScore tracks of overall predicted regulatory activity. ChromActivity provides a resource for analyzing and interpreting the human regulatory genome across diverse cell types.
Collapse
Affiliation(s)
- Tevfik Umut Dincer
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, 90095, USA
- Department of Biological Chemistry, University of California, Los Angeles, CA, 90095, USA
| | - Jason Ernst
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, 90095, USA
- Department of Biological Chemistry, University of California, Los Angeles, CA, 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at University of California, Los Angeles, CA, 90095, USA
- Computer Science Department, University of California, Los Angeles, CA, 90095, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, CA, 90095, USA
| |
Collapse
|
28
|
Yong F, Yan M, Zhang L, Ji W, Zhao S, Gao Y. Analysis of Functional Promoter of Camel FGF21 Gene and Identification of Small Compounds Targeting FGF21 Protein. Vet Sci 2023; 10:452. [PMID: 37505857 PMCID: PMC10383868 DOI: 10.3390/vetsci10070452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 06/27/2023] [Accepted: 07/06/2023] [Indexed: 07/29/2023] Open
Abstract
The fibroblast growth factor 21 (FGF21) gene plays an important role in the mechanism of glucose and lipid metabolism and is a promising therapeutic target for metabolic disease. Camels display a unique regulation characteristic of glucose and lipid metabolism, endowing them with the ability to adapt to survive drought and chronic hunger. However, the knowledge about the camel FGF21 gene regulation and its differences between humans and mice is still limited. In this study, camel FGF21 gene promoter was obtained for ~2000 bp upstream of the transcriptional start site (TSS). Bioinformatics analysis showed that the proximal promoter region sequences near the TSS between humans and camels have high similarity. Two potential core active regions are located in the -445-612 bp region. In addition, camel FGF21 promoter contains three CpG islands (CGIs), located in the -435~-1168 bp regions, significantly more and longer than in humans and mice. The transcription factor binding prediction showed that most transcription factors, including major functional transcription factors, are the same in different species although the binding site positions in the promoter are different. These results indicated that the signaling pathways involved in FGF21 gene transcription regulation are conservative in mammals. Truncated fragments recombinant vectors and luciferase reporter assay determined that camel FGF21 core promoter is located within the 800 bp region upstream of the TSS and an enhancer may exist between the -1000 and -2000 bp region. Combining molecular docking and in silico ADMET druggability prediction, two compounds were screened as the most promising candidate drugs specifically targeting FGF21. This study expanded the functions of these small molecules and provided a foundation for drug development targeting FGF21.
Collapse
Affiliation(s)
- Fang Yong
- College of Life Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
| | - Meilin Yan
- College of Life Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
| | - Lili Zhang
- College of Life Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
| | - Wangye Ji
- College of Life Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
| | - Shuqin Zhao
- College of Life Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
- Gansu Key Laboratory of Animal Generational Physiology and Reproductive Regulation, Lanzhou 730070, China
| | - Yuan Gao
- College of Life Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
- Gansu Key Laboratory of Animal Generational Physiology and Reproductive Regulation, Lanzhou 730070, China
| |
Collapse
|
29
|
Zhang Z, Feng F, Qiu Y, Liu J. A generalizable framework to comprehensively predict epigenome, chromatin organization, and transcriptome. Nucleic Acids Res 2023; 51:5931-5947. [PMID: 37224527 PMCID: PMC10325920 DOI: 10.1093/nar/gkad436] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 03/31/2023] [Accepted: 05/09/2023] [Indexed: 05/26/2023] Open
Abstract
Many deep learning approaches have been proposed to predict epigenetic profiles, chromatin organization, and transcription activity. While these approaches achieve satisfactory performance in predicting one modality from another, the learned representations are not generalizable across predictive tasks or across cell types. In this paper, we propose a deep learning approach named EPCOT which employs a pre-training and fine-tuning framework, and is able to accurately and comprehensively predict multiple modalities including epigenome, chromatin organization, transcriptome, and enhancer activity for new cell types, by only requiring cell-type specific chromatin accessibility profiles. Many of these predicted modalities, such as Micro-C and ChIA-PET, are quite expensive to get in practice, and the in silico prediction from EPCOT should be quite helpful. Furthermore, this pre-training and fine-tuning framework allows EPCOT to identify generic representations generalizable across different predictive tasks. Interpreting EPCOT models also provides biological insights including mapping between different genomic modalities, identifying TF sequence binding patterns, and analyzing cell-type specific TF impacts on enhancer activity.
Collapse
Affiliation(s)
- Zhenhao Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
| | - Fan Feng
- Department of Computational Medicine and Bioinformatics, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
| | - Yiyang Qiu
- Department of Computer Science and Engineering, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
| | - Jie Liu
- Department of Computational Medicine and Bioinformatics, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
- Department of Computer Science and Engineering, University of Michigan, 500 S. State St, Ann Arbor, MI 48109, USA
| |
Collapse
|
30
|
Stefan K, Barski A. Cis-regulatory atlas of primary human CD4+ T cells. BMC Genomics 2023; 24:253. [PMID: 37170195 PMCID: PMC10173520 DOI: 10.1186/s12864-023-09288-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 03/31/2023] [Indexed: 05/13/2023] Open
Abstract
Cis-regulatory elements (CRE) are critical for coordinating gene expression programs that dictate cell-specific differentiation and homeostasis. Recently developed self-transcribing active regulatory region sequencing (STARR-Seq) has allowed for genome-wide annotation of functional CREs. Despite this, STARR-Seq assays are only employed in cell lines, in part, due to difficulties in delivering reporter constructs. Herein, we implemented and validated a STARR-Seq-based screen in human CD4+ T cells using a non-integrating lentiviral transduction system. Lenti-STARR-Seq is the first example of a genome-wide assay of CRE function in human primary cells, identifying thousands of functional enhancers and negative regulatory elements (NREs) in human CD4+ T cells. We find an unexpected difference in nucleosome organization between enhancers and NRE: enhancers are located between nucleosomes, whereas NRE are occupied by nucleosomes in their endogenous locations. We also describe chromatin modification, eRNA production, and transcription factor binding at both enhancers and NREs. Our findings support the idea of silencer repurposing as enhancers in alternate cell types. Collectively, these data suggest that Lenti-STARR-Seq is a successful approach for CRE screening in primary human cell types, and provides an atlas of functional CREs in human CD4+ T cells.
Collapse
Affiliation(s)
- Kurtis Stefan
- Division of Allergy & Immunology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7028, Cincinnati, OH, 45229-3026, USA
- Medical Scientist Training Program (MSTP), University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA
| | - Artem Barski
- Division of Allergy & Immunology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7028, Cincinnati, OH, 45229-3026, USA.
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, 45229-3026, USA.
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA.
| |
Collapse
|
31
|
Di Giorgio E, Benetti R, Kerschbamer E, Xodo L, Brancolini C. Super-enhancer landscape rewiring in cancer: The epigenetic control at distal sites. INTERNATIONAL REVIEW OF CELL AND MOLECULAR BIOLOGY 2023; 380:97-148. [PMID: 37657861 DOI: 10.1016/bs.ircmb.2023.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]
Abstract
Super-enhancers evolve as elements at the top of the hierarchical control of gene expression. They are important end-gatherers of signaling pathways that control stemness, differentiation or adaptive responses. Many epigenetic regulations focus on these regions, and not surprisingly, during the process of tumorigenesis, various alterations can account for their dysfunction. Super-enhancers are emerging as key drivers of the aberrant gene expression landscape that sustain the aggressiveness of cancer cells. In this review, we will describe and discuss about the structure of super-enhancers, their epigenetic regulation, and the major changes affecting their functionality in cancer.
Collapse
Affiliation(s)
- Eros Di Giorgio
- Laboratory of Biochemistry, Department of Medicine, Università degli Studi di Udine, Udine, Italy
| | - Roberta Benetti
- Laboratory of Epigenomics, Department of Medicine, Università degli Studi di Udine, Udine, Italy
| | - Emanuela Kerschbamer
- Laboratory of Epigenomics, Department of Medicine, Università degli Studi di Udine, Udine, Italy
| | - Luigi Xodo
- Laboratory of Biochemistry, Department of Medicine, Università degli Studi di Udine, Udine, Italy
| | - Claudio Brancolini
- Laboratory of Epigenomics, Department of Medicine, Università degli Studi di Udine, Udine, Italy.
| |
Collapse
|
32
|
Rozowsky J, Gao J, Borsari B, Yang YT, Galeev T, Gürsoy G, Epstein CB, Xiong K, Xu J, Li T, Liu J, Yu K, Berthel A, Chen Z, Navarro F, Sun MS, Wright J, Chang J, Cameron CJF, Shoresh N, Gaskell E, Drenkow J, Adrian J, Aganezov S, Aguet F, Balderrama-Gutierrez G, Banskota S, Corona GB, Chee S, Chhetri SB, Cortez Martins GC, Danyko C, Davis CA, Farid D, Farrell NP, Gabdank I, Gofin Y, Gorkin DU, Gu M, Hecht V, Hitz BC, Issner R, Jiang Y, Kirsche M, Kong X, Lam BR, Li S, Li B, Li X, Lin KZ, Luo R, Mackiewicz M, Meng R, Moore JE, Mudge J, Nelson N, Nusbaum C, Popov I, Pratt HE, Qiu Y, Ramakrishnan S, Raymond J, Salichos L, Scavelli A, Schreiber JM, Sedlazeck FJ, See LH, Sherman RM, Shi X, Shi M, Sloan CA, Strattan JS, Tan Z, Tanaka FY, Vlasova A, Wang J, Werner J, Williams B, Xu M, Yan C, Yu L, Zaleski C, Zhang J, Ardlie K, Cherry JM, Mendenhall EM, Noble WS, Weng Z, Levine ME, Dobin A, Wold B, Mortazavi A, Ren B, Gillis J, Myers RM, Snyder MP, Choudhary J, Milosavljevic A, Schatz MC, Bernstein BE, et alRozowsky J, Gao J, Borsari B, Yang YT, Galeev T, Gürsoy G, Epstein CB, Xiong K, Xu J, Li T, Liu J, Yu K, Berthel A, Chen Z, Navarro F, Sun MS, Wright J, Chang J, Cameron CJF, Shoresh N, Gaskell E, Drenkow J, Adrian J, Aganezov S, Aguet F, Balderrama-Gutierrez G, Banskota S, Corona GB, Chee S, Chhetri SB, Cortez Martins GC, Danyko C, Davis CA, Farid D, Farrell NP, Gabdank I, Gofin Y, Gorkin DU, Gu M, Hecht V, Hitz BC, Issner R, Jiang Y, Kirsche M, Kong X, Lam BR, Li S, Li B, Li X, Lin KZ, Luo R, Mackiewicz M, Meng R, Moore JE, Mudge J, Nelson N, Nusbaum C, Popov I, Pratt HE, Qiu Y, Ramakrishnan S, Raymond J, Salichos L, Scavelli A, Schreiber JM, Sedlazeck FJ, See LH, Sherman RM, Shi X, Shi M, Sloan CA, Strattan JS, Tan Z, Tanaka FY, Vlasova A, Wang J, Werner J, Williams B, Xu M, Yan C, Yu L, Zaleski C, Zhang J, Ardlie K, Cherry JM, Mendenhall EM, Noble WS, Weng Z, Levine ME, Dobin A, Wold B, Mortazavi A, Ren B, Gillis J, Myers RM, Snyder MP, Choudhary J, Milosavljevic A, Schatz MC, Bernstein BE, Guigó R, Gingeras TR, Gerstein M. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Cell 2023; 186:1493-1511.e40. [PMID: 37001506 PMCID: PMC10074325 DOI: 10.1016/j.cell.2023.02.018] [Show More Authors] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 10/16/2022] [Accepted: 02/10/2023] [Indexed: 04/03/2023]
Abstract
Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.
Collapse
Affiliation(s)
- Joel Rozowsky
- Section on Biomedical Informatics and Data Science, Yale University, New Haven, CT, USA; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jiahao Gao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Beatrice Borsari
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Yucheng T Yang
- Institute of Science and Technology for Brain-Inspired Intelligence; MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence; MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Timur Galeev
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Gamze Gürsoy
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | - Kun Xiong
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jinrui Xu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Tianxiao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jason Liu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Keyang Yu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Ana Berthel
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Zhanlin Chen
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA
| | - Fabio Navarro
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Maxwell S Sun
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | - Justin Chang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Christopher J F Cameron
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Noam Shoresh
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jorg Drenkow
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jessika Adrian
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Sergey Aganezov
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | | | | | | | | | - Sora Chee
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Surya B Chhetri
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Gabriel Conte Cortez Martins
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Cassidy Danyko
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Carrie A Davis
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Daniel Farid
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | - Idan Gabdank
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Yoel Gofin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - David U Gorkin
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Mengting Gu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Vivian Hecht
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin C Hitz
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Robbyn Issner
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Melanie Kirsche
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Xiangmeng Kong
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Bonita R Lam
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Shantao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Bian Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Xiqi Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Khine Zin Lin
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Hong Kong, CHN
| | - Mark Mackiewicz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Ran Meng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Jonathan Mudge
- European Bioinformatics Institute, Cambridge, Cambridgeshire, GB
| | | | - Chad Nusbaum
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ioann Popov
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Henry E Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Yunjiang Qiu
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Srividya Ramakrishnan
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Joe Raymond
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Leonidas Salichos
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Department of Biological and Chemical Sciences, New York Institute of Technology, Old Westbury, NY, USA
| | - Alexandra Scavelli
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jacob M Schreiber
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Fritz J Sedlazeck
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA; Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Lei Hoon See
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Rachel M Sherman
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Xu Shi
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Minyi Shi
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Cricket Alicia Sloan
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - J Seth Strattan
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Zhen Tan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Forrest Y Tanaka
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Anna Vlasova
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain; Comparative Genomics Group, Life Science Programme, Barcelona Supercomputing Centre, Barcelona, Spain; Institute of Research in Biomedicine, Barcelona, Spain
| | - Jun Wang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jonathan Werner
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Min Xu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Chengfei Yan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Lu Yu
- Institute of Cancer Research, London, UK
| | - Christopher Zaleski
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, Irvine, CA, USA
| | | | - J Michael Cherry
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | | | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Morgan E Levine
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Pathology, Yale University School of Medicine, New Haven, CT, USA
| | - Alexander Dobin
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Jesse Gillis
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Department of Physiology, University of Toronto, Toronto, ON, Canada
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Michael P Snyder
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | | | | | - Michael C Schatz
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA; Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| | - Bradley E Bernstein
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Roderic Guigó
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain; Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
| | - Thomas R Gingeras
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| | - Mark Gerstein
- Section on Biomedical Informatics and Data Science, Yale University, New Haven, CT, USA; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Department of Statistics and Data Science, Yale University, New Haven, CT, USA; Department of Computer Science, Yale University, New Haven, CT, USA.
| |
Collapse
|
33
|
Flury V, Reverón-Gómez N, Alcaraz N, Stewart-Morgan KR, Wenger A, Klose RJ, Groth A. Recycling of modified H2A-H2B provides short-term memory of chromatin states. Cell 2023; 186:1050-1065.e19. [PMID: 36750094 PMCID: PMC9994263 DOI: 10.1016/j.cell.2023.01.007] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/11/2022] [Accepted: 01/06/2023] [Indexed: 02/08/2023]
Abstract
Chromatin landscapes are disrupted during DNA replication and must be restored faithfully to maintain genome regulation and cell identity. The histone H3-H4 modification landscape is restored by parental histone recycling and modification of new histones. How DNA replication impacts on histone H2A-H2B is currently unknown. Here, we measure H2A-H2B modifications and H2A.Z during DNA replication and across the cell cycle using quantitative genomics. We show that H2AK119ub1, H2BK120ub1, and H2A.Z are recycled accurately during DNA replication. Modified H2A-H2B are segregated symmetrically to daughter strands via POLA1 on the lagging strand, but independent of H3-H4 recycling. Post-replication, H2A-H2B modification and variant landscapes are quickly restored, and H2AK119ub1 guides accurate restoration of H3K27me3. This work reveals epigenetic transmission of parental H2A-H2B during DNA replication and identifies cross talk between H3-H4 and H2A-H2B modifications in epigenome propagation. We propose that rapid short-term memory of recycled H2A-H2B modifications facilitates restoration of stable H3-H4 chromatin states.
Collapse
Affiliation(s)
- Valentin Flury
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark; Biotech Research and Innovation Centre, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Nazaret Reverón-Gómez
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark; Biotech Research and Innovation Centre, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Nicolas Alcaraz
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark; Biotech Research and Innovation Centre, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Kathleen R Stewart-Morgan
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark; Biotech Research and Innovation Centre, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Alice Wenger
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark; Biotech Research and Innovation Centre, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Robert J Klose
- Department of Biochemistry, University of Oxford, Oxford OX1 3QU, UK
| | - Anja Groth
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark; Biotech Research and Innovation Centre, University of Copenhagen, 2200 Copenhagen, Denmark.
| |
Collapse
|
34
|
Han D, Liu G, Oh Y, Oh S, Yang S, Mandjikian L, Rani N, Almeida MC, Kosik KS, Jang J. ZBTB12 is a molecular barrier to dedifferentiation in human pluripotent stem cells. Nat Commun 2023; 14:632. [PMID: 36759523 PMCID: PMC9911396 DOI: 10.1038/s41467-023-36178-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 01/18/2023] [Indexed: 02/11/2023] Open
Abstract
Development is generally viewed as one-way traffic of cell state transition from primitive to developmentally advanced states. However, molecular mechanisms that ensure the unidirectional transition of cell fates remain largely unknown. Through exact transcription start site mapping, we report an evolutionarily conserved BTB domain-containing zinc finger protein, ZBTB12, as a molecular barrier for dedifferentiation of human pluripotent stem cells (hPSCs). Single-cell RNA sequencing reveals that ZBTB12 is essential for three germ layer differentiation by blocking hPSC dedifferentiation. Mechanistically, ZBTB12 fine-tunes the expression of human endogenous retrovirus H (HERVH), a primate-specific retrotransposon, and targets specific transcripts that utilize HERVH as a regulatory element. In particular, the downregulation of HERVH-overlapping long non-coding RNAs (lncRNAs) by ZBTB12 is necessary for a successful exit from a pluripotent state and lineage derivation. Overall, we identify ZBTB12 as a molecular barrier that safeguards the unidirectional transition of metastable stem cell fates toward developmentally advanced states.
Collapse
Affiliation(s)
- Dasol Han
- Neuroscience Research Institute, Department of Molecular, Cellular, and Developmental Biology, University of California, Santa Barbara, CA, USA
| | - Guojing Liu
- Neuroscience Research Institute, Department of Molecular, Cellular, and Developmental Biology, University of California, Santa Barbara, CA, USA
- Novogene Co., Ltd, Beijing, China
| | - Yujeong Oh
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang, Korea
| | - Seyoun Oh
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang, Korea
| | - Seungbok Yang
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang, Korea
| | - Lori Mandjikian
- Neuroscience Research Institute, Department of Molecular, Cellular, and Developmental Biology, University of California, Santa Barbara, CA, USA
| | - Neha Rani
- Neuroscience Research Institute, Department of Molecular, Cellular, and Developmental Biology, University of California, Santa Barbara, CA, USA
- Department of Biological Sciences & Bioengineering, Indian Institute of Technology, Kanpur, India
| | - Maria C Almeida
- Neuroscience Research Institute, Department of Molecular, Cellular, and Developmental Biology, University of California, Santa Barbara, CA, USA
- Federal University of ABC, Center for Natural and Human Sciences São Bernardo do Campo, Santo André, Brazil
| | - Kenneth S Kosik
- Neuroscience Research Institute, Department of Molecular, Cellular, and Developmental Biology, University of California, Santa Barbara, CA, USA.
| | - Jiwon Jang
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang, Korea.
| |
Collapse
|
35
|
Tan DS, Cheung SL, Gao Y, Weinbuch M, Hu H, Shi L, Ti SC, Hutchins AP, Cojocaru V, Jauch R. The homeodomain of Oct4 is a dimeric binder of methylated CpG elements. Nucleic Acids Res 2023; 51:1120-1138. [PMID: 36631980 PMCID: PMC9943670 DOI: 10.1093/nar/gkac1262] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 12/14/2022] [Accepted: 12/19/2022] [Indexed: 01/13/2023] Open
Abstract
Oct4 is essential to maintain pluripotency and has a pivotal role in establishing the germline. Its DNA-binding POU domain was recently found to bind motifs with methylated CpG elements normally associated with epigenetic silencing. However, the mode of binding and the consequences of this capability has remained unclear. Here, we show that Oct4 binds to a compact palindromic DNA element with a methylated CpG core (CpGpal) in alternative states of pluripotency and during cellular reprogramming towards induced pluripotent stem cells (iPSCs). During cellular reprogramming, typical Oct4 bound enhancers are uniformly demethylated, with the prominent exception of the CpGpal sites where DNA methylation is often maintained. We demonstrate that Oct4 cooperatively binds the CpGpal element as a homodimer, which contrasts with the ectoderm-expressed POU factor Brn2. Indeed, binding to CpGpal is Oct4-specific as other POU factors expressed in somatic cells avoid this element. Binding assays combined with structural analyses and molecular dynamic simulations show that dimeric Oct4-binding to CpGpal is driven by the POU-homeodomain whilst the POU-specific domain is detached from DNA. Collectively, we report that Oct4 exerts parts of its regulatory function in the context of methylated DNA through a DNA recognition mechanism that solely relies on its homeodomain.
Collapse
Affiliation(s)
- Daisylyn Senna Tan
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Shun Lai Cheung
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Ya Gao
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Maike Weinbuch
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China,Institute for Molecular Medicine, Ulm University, Ulm, Germany
| | - Haoqing Hu
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Liyang Shi
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Shih-Chieh Ti
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Andrew P Hutchins
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Vlad Cojocaru
- STAR-UBB Institute, Babeş-Bolyai University, Cluj-Napoca, Romania,Computational Structural Biology Group, Utrecht University, The Netherlands,Max Planck Institute for Molecular Biomedicine, Münster, Germany
| | - Ralf Jauch
- To whom correspondence should be addressed. Tel: +852 3917 9511; Fax: +852 28559730;
| |
Collapse
|
36
|
Ni P, Moe J, Su Z. Accurate prediction of functional states of cis-regulatory modules reveals common epigenetic rules in humans and mice. BMC Biol 2022; 20:221. [PMID: 36199141 PMCID: PMC9535988 DOI: 10.1186/s12915-022-01426-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 09/29/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Predicting cis-regulatory modules (CRMs) in a genome and their functional states in various cell/tissue types of the organism are two related challenging computational tasks. Most current methods attempt to simultaneously achieve both using data of multiple epigenetic marks in a cell/tissue type. Though conceptually attractive, they suffer high false discovery rates and limited applications. To fill the gaps, we proposed a two-step strategy to first predict a map of CRMs in the genome, and then predict functional states of all the CRMs in various cell/tissue types of the organism. We have recently developed an algorithm for the first step that was able to more accurately and completely predict CRMs in a genome than existing methods by integrating numerous transcription factor ChIP-seq datasets in the organism. Here, we presented machine-learning methods for the second step. RESULTS We showed that functional states in a cell/tissue type of all the CRMs in the genome could be accurately predicted using data of only 1~4 epigenetic marks by a variety of machine-learning classifiers. Our predictions are substantially more accurate than the best achieved so far. Interestingly, a model trained on a cell/tissue type in humans can accurately predict functional states of CRMs in different cell/tissue types of humans as well as of mice, and vice versa. Therefore, epigenetic code that defines functional states of CRMs in various cell/tissue types is universal at least in humans and mice. Moreover, we found that from tens to hundreds of thousands of CRMs were active in a human and mouse cell/tissue type, and up to 99.98% of them were reutilized in different cell/tissue types, while as small as 0.02% of them were unique to a cell/tissue type that might define the cell/tissue type. CONCLUSIONS Our two-step approach can accurately predict functional states in any cell/tissue type of all the CRMs in the genome using data of only 1~4 epigenetic marks. Our approach is also more cost-effective than existing methods that typically use data of more epigenetic marks. Our results suggest common epigenetic rules for defining functional states of CRMs in various cell/tissue types in humans and mice.
Collapse
Affiliation(s)
- Pengyu Ni
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Joshua Moe
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| |
Collapse
|
37
|
Zhang L, Zhang J, Nie Q. DIRECT-NET: An efficient method to discover cis-regulatory elements and construct regulatory networks from single-cell multiomics data. SCIENCE ADVANCES 2022; 8:eabl7393. [PMID: 35648859 PMCID: PMC9159696 DOI: 10.1126/sciadv.abl7393] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 04/14/2022] [Indexed: 05/13/2023]
Abstract
The emergence of single-cell multiomics data provides unprecedented opportunities to scrutinize the transcriptional regulatory mechanisms controlling cell identity. However, how to use those datasets to dissect the cis-regulatory element (CRE)-to-gene relationships at a single-cell level remains a major challenge. Here, we present DIRECT-NET, a machine-learning method based on gradient boosting, to identify genome-wide CREs and their relationship to target genes, either from parallel single-cell gene expression and chromatin accessibility data or from single-cell chromatin accessibility data alone. By extensively evaluating and characterizing DIRECT-NET's predicted CREs using independent functional genomics data, we find that DIRECT-NET substantially improves the accuracy of inferring CRE-to-gene relationships in comparison to existing methods. DIRECT-NET is also capable of revealing cell subpopulation-specific and dynamic regulatory linkages. Overall, DIRECT-NET provides an efficient tool for predicting transcriptional regulation codes from single-cell multiomics data.
Collapse
Affiliation(s)
- Lihua Zhang
- School of Computer Science, Wuhan University, Wuhan 430072, China
- Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA
- NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA 92697, USA
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, Irvine, CA 92697, USA
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA
- NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA 92697, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697, USA
| |
Collapse
|
38
|
Gao Y, Chen Y, Feng H, Zhang Y, Yue Z. RicENN: Prediction of Rice Enhancers with Neural Network Based on DNA Sequences. Interdiscip Sci 2022; 14:555-565. [PMID: 35190950 DOI: 10.1007/s12539-022-00503-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 01/07/2022] [Accepted: 01/18/2022] [Indexed: 01/22/2023]
Abstract
Enhancers are the primary cis-elements of transcriptional regulation and play a vital role in gene expression at different stages of plant growth and development. Having high locational variation and free scattering in non-encoding genomes, identification of enhancers is a crucial, but challenging work in understanding the biological mechanism of model plants. Recently, applications of neural network models are gaining increasing popularity in predicting the function of genomic elements. Although several computational models have shown great advantages to tackle this challenge, a further study of the identification of rice enhancers from DNA sequences is still lacking. We present RicENN, a novel deep learning framework capable of accurately identifying enhancers of rice, integrating convolution neural networks (CNNs), bi-directional recurrent neural networks (RNNs), and attention mechanisms. A combined-feature representation method was designed to extract the sequence features from original DNA sequences using six types of autocorrelation encodings. Moreover, we verified that the integrated model achieves the best performance by an ablation study. Finally, our deep learning framework realized a reliable prediction of the rice enhancers. The results show RicENN outperforms available alternative approaches in rice species, achieving the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) of 0.960 and 0.960 on cross-validation, and 0.879 and 0.877 during independent tests, respectively. This study develops a hybrid model to combine the merits of different neural network architectures, which shows the potential ability to apply deep learning in bioinformatic sequences and contributes to the acceleration of functional genomic studies of rice. RicENN and its code are freely accessible at http://bioinfor.aielab.cc/RicENN/ .
Collapse
Affiliation(s)
- Yujia Gao
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Yiqiong Chen
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Haisong Feng
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Youhua Zhang
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| | - Zhenyu Yue
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
39
|
Mulero Hernández J, Fernández-Breis JT. Analysis of the landscape of human enhancer sequences in biological databases. Comput Struct Biotechnol J 2022; 20:2728-2744. [PMID: 35685360 PMCID: PMC9168495 DOI: 10.1016/j.csbj.2022.05.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/20/2022] [Accepted: 05/21/2022] [Indexed: 12/01/2022] Open
Abstract
The process of gene regulation extends as a network in which both genetic sequences and proteins are involved. The levels of regulation and the mechanisms involved are multiple. Transcription is the main control mechanism for most genes, being the downstream steps responsible for refining the transcription patterns. In turn, gene transcription is mainly controlled by regulatory events that occur at promoters and enhancers. Several studies are focused on analyzing the contribution of enhancers in the development of diseases and their possible use as therapeutic targets. The study of regulatory elements has advanced rapidly in recent years with the development and use of next generation sequencing techniques. All this information has generated a large volume of information that has been transferred to a growing number of public repositories that store this information. In this article, we analyze the content of those public repositories that contain information about human enhancers with the aim of detecting whether the knowledge generated by scientific research is contained in those databases in a way that could be computationally exploited. The analysis will be based on three main aspects identified in the literature: types of enhancers, type of evidence about the enhancers, and methods for detecting enhancer-promoter interactions. Our results show that no single database facilitates the optimal exploitation of enhancer data, most types of enhancers are not represented in the databases and there is need for a standardized model for enhancers. We have identified major gaps and challenges for the computational exploitation of enhancer data.
Collapse
Affiliation(s)
- Juan Mulero Hernández
- Dept. Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, Spain
| | | |
Collapse
|
40
|
Baxter SM, Posey JE, Lake NJ, Sobreira N, Chong JX, Buyske S, Blue EE, Chadwick LH, Coban-Akdemir ZH, Doheny KF, Davis CP, Lek M, Wellington C, Jhangiani SN, Gerstein M, Gibbs RA, Lifton RP, MacArthur DG, Matise TC, Lupski JR, Valle D, Bamshad MJ, Hamosh A, Mane S, Nickerson DA, Rehm HL, O'Donnell-Luria A. Centers for Mendelian Genomics: A decade of facilitating gene discovery. Genet Med 2022; 24:784-797. [PMID: 35148959 PMCID: PMC9119004 DOI: 10.1016/j.gim.2021.12.005] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 12/08/2021] [Accepted: 12/12/2021] [Indexed: 12/27/2022] Open
Abstract
PURPOSE Mendelian disease genomic research has undergone a massive transformation over the past decade. With increasing availability of exome and genome sequencing, the role of Mendelian research has expanded beyond data collection, sequencing, and analysis to worldwide data sharing and collaboration. METHODS Over the past 10 years, the National Institutes of Health-supported Centers for Mendelian Genomics (CMGs) have played a major role in this research and clinical evolution. RESULTS We highlight the cumulative gene discoveries facilitated by the program, biomedical research leveraged by the approach, and the larger impact on the research community. Beyond generating a list of gene-phenotype relationships and participating in widespread data sharing, the CMGs have created resources, tools, and training for the larger community to foster understanding of genes and genome variation. The CMGs have participated in a wide range of data sharing activities, including deposition of all eligible CMG data into the Analysis, Visualization, and Informatics Lab-space (AnVIL), sharing candidate genes through the Matchmaker Exchange and the CMG website, and sharing variants in Genotypes to Mendelian Phenotypes (Geno2MP) and VariantMatcher. CONCLUSION The work is far from complete; strengthening communication between research and clinical realms, continued development and sharing of knowledge and tools, and improving access to richly characterized data sets are all required to diagnose the remaining molecularly undiagnosed patients.
Collapse
Affiliation(s)
- Samantha M Baxter
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA.
| | - Jennifer E Posey
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | - Nicole J Lake
- Department of Genetics, Yale School of Medicine, New Haven, CT; Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Nara Sobreira
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Jessica X Chong
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA; Brotman Baty Institute for Precision Medicine, Seattle, WA
| | - Steven Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ; Department of Genetics, Rutgers University, Piscataway, NJ
| | - Elizabeth E Blue
- Brotman Baty Institute for Precision Medicine, Seattle, WA; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA
| | - Lisa H Chadwick
- Division of Genome Sciences, National Human Genome Research Institute, Bethesda, MD
| | - Zeynep H Coban-Akdemir
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX; Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX
| | - Kimberly F Doheny
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Colleen P Davis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA
| | - Monkol Lek
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA; Department of Genetics, Yale School of Medicine, New Haven, CT
| | | | | | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT
| | - Richard A Gibbs
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX
| | - Richard P Lifton
- Department of Genetics, Yale School of Medicine, New Haven, CT; Laboratory of Human Genetics and Genomics, The Rockefeller University, New York, NY
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA; Centre for Population Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, New South Wales, Australia; Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Tara C Matise
- Department of Genetics, Rutgers University, Piscataway, NJ
| | - James R Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX; Department of Pediatrics, Baylor College of Medicine, Houston, TX
| | - David Valle
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Michael J Bamshad
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA; Brotman Baty Institute for Precision Medicine, Seattle, WA; Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA
| | - Ada Hamosh
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Shrikant Mane
- Department of Genetics, Yale School of Medicine, New Haven, CT
| | - Deborah A Nickerson
- Brotman Baty Institute for Precision Medicine, Seattle, WA; Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA.
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA; Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA.
| |
Collapse
|
41
|
Shoombuatong W, Basith S, Pitti T, Lee G, Manavalan B. THRONE: a new approach for accurate prediction of human RNA N7-methylguanosine sites. J Mol Biol 2022; 434:167549. [DOI: 10.1016/j.jmb.2022.167549] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 03/08/2022] [Accepted: 03/10/2022] [Indexed: 12/30/2022]
|
42
|
Jansen C, Paraiso KD, Zhou JJ, Blitz IL, Fish MB, Charney RM, Cho JS, Yasuoka Y, Sudou N, Bright AR, Wlizla M, Veenstra GJC, Taira M, Zorn AM, Mortazavi A, Cho KWY. Uncovering the mesendoderm gene regulatory network through multi-omic data integration. Cell Rep 2022; 38:110364. [PMID: 35172134 PMCID: PMC8917868 DOI: 10.1016/j.celrep.2022.110364] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 10/30/2021] [Accepted: 01/19/2022] [Indexed: 01/01/2023] Open
Abstract
Mesendodermal specification is one of the earliest events in embryogenesis, where cells first acquire distinct identities. Cell differentiation is a highly regulated process that involves the function of numerous transcription factors (TFs) and signaling molecules, which can be described with gene regulatory networks (GRNs). Cell differentiation GRNs are difficult to build because existing mechanistic methods are low throughput, and high-throughput methods tend to be non-mechanistic. Additionally, integrating highly dimensional data composed of more than two data types is challenging. Here, we use linked self-organizing maps to combine chromatin immunoprecipitation sequencing (ChIP-seq)/ATAC-seq with temporal, spatial, and perturbation RNA sequencing (RNA-seq) data from Xenopus tropicalis mesendoderm development to build a high-resolution genome scale mechanistic GRN. We recover both known and previously unsuspected TF-DNA/TF-TF interactions validated through reporter assays. Our analysis provides insights into transcriptional regulation of early cell fate decisions and provides a general approach to building GRNs using highly dimensional multi-omic datasets.
Collapse
Affiliation(s)
- Camden Jansen
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA; Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Kitt D Paraiso
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA; Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Jeff J Zhou
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Ira L Blitz
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Margaret B Fish
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Rebekah M Charney
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Jin Sun Cho
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
| | - Yuuri Yasuoka
- Laboratory for Comprehensive Genomic Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Norihiro Sudou
- Department of Anatomy, School of Medicine, Toho University, Tokyo, Japan
| | - Ann Rose Bright
- Department of Molecular Developmental Biology, Radboud University, Nijmegen, the Netherlands
| | - Marcin Wlizla
- Division of Developmental Biology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Gert Jan C Veenstra
- Department of Molecular Developmental Biology, Radboud University, Nijmegen, the Netherlands
| | - Masanori Taira
- Department of Biological Sciences, Chuo University, Tokyo, Japan
| | - Aaron M Zorn
- Division of Developmental Biology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA; Center for Complex Biological Systems, University of California, Irvine, CA, USA.
| | - Ken W Y Cho
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA; Center for Complex Biological Systems, University of California, Irvine, CA, USA.
| |
Collapse
|
43
|
Arslan E, Schulz J, Rai K. Machine Learning in Epigenomics: Insights into Cancer Biology and Medicine. Biochim Biophys Acta Rev Cancer 2021; 1876:188588. [PMID: 34245839 PMCID: PMC8595561 DOI: 10.1016/j.bbcan.2021.188588] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 05/29/2021] [Accepted: 07/02/2021] [Indexed: 02/01/2023]
Abstract
The recent deluge of genome-wide technologies for the mapping of the epigenome and resulting data in cancer samples has provided the opportunity for gaining insights into and understanding the roles of epigenetic processes in cancer. However, the complexity, high-dimensionality, sparsity, and noise associated with these data pose challenges for extensive integrative analyses. Machine Learning (ML) algorithms are particularly suited for epigenomic data analyses due to their flexibility and ability to learn underlying hidden structures. We will discuss four overlapping but distinct major categories under ML: dimensionality reduction, unsupervised methods, supervised methods, and deep learning (DL). We review the preferred use cases of these algorithms in analyses of cancer epigenomics data with the hope to provide an overview of how ML approaches can be used to explore fundamental questions on the roles of epigenome in cancer biology and medicine.
Collapse
Affiliation(s)
- Emre Arslan
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Jonathan Schulz
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Kunal Rai
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America.
| |
Collapse
|
44
|
Wolfe JC, Mikheeva LA, Hagras H, Zabet NR. An explainable artificial intelligence approach for decoding the enhancer histone modifications code and identification of novel enhancers in Drosophila. Genome Biol 2021; 22:308. [PMID: 34749786 PMCID: PMC8574042 DOI: 10.1186/s13059-021-02532-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 10/29/2021] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Enhancers are non-coding regions of the genome that control the activity of target genes. Recent efforts to identify active enhancers experimentally and in silico have proven effective. While these tools can predict the locations of enhancers with a high degree of accuracy, the mechanisms underpinning the activity of enhancers are often unclear. RESULTS Using machine learning (ML) and a rule-based explainable artificial intelligence (XAI) model, we demonstrate that we can predict the location of known enhancers in Drosophila with a high degree of accuracy. Most importantly, we use the rules of the XAI model to provide insight into the underlying combinatorial histone modifications code of enhancers. In addition, we identified a large set of putative enhancers that display the same epigenetic signature as enhancers identified experimentally. These putative enhancers are enriched in nascent transcription, divergent transcription and have 3D contacts with promoters of transcribed genes. However, they display only intermediary enrichment of mediator and cohesin complexes compared to previously characterised active enhancers. We also found that 10-15% of the predicted enhancers display similar characteristics to super enhancers observed in other species. CONCLUSIONS Here, we applied an explainable AI model to predict enhancers with high accuracy. Most importantly, we identified that different combinations of epigenetic marks characterise different groups of enhancers. Finally, we discovered a large set of putative enhancers which display similar characteristics with previously characterised active enhancers.
Collapse
Affiliation(s)
- Jareth C Wolfe
- School of Life Sciences, University of Essex, Colchester, CO4 3SQ, UK
- School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ, UK
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, E1 2AT, London, UK
| | - Liudmila A Mikheeva
- School of Life Sciences, University of Essex, Colchester, CO4 3SQ, UK
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, E1 2AT, London, UK
- Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, UK
| | - Hani Hagras
- School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ, UK.
| | - Nicolae Radu Zabet
- School of Life Sciences, University of Essex, Colchester, CO4 3SQ, UK.
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, E1 2AT, London, UK.
| |
Collapse
|
45
|
Arenas-Mena C, Miljovska S, Rice EJ, Gurges J, Shashikant T, Wang Z, Ercan S, Danko CG. Identification and prediction of developmental enhancers in sea urchin embryos. BMC Genomics 2021; 22:751. [PMID: 34666684 PMCID: PMC8527612 DOI: 10.1186/s12864-021-07936-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 06/28/2021] [Indexed: 11/21/2022] Open
Abstract
Background The transcription of developmental regulatory genes is often controlled by multiple cis-regulatory elements. The identification and functional characterization of distal regulatory elements remains challenging, even in tractable model organisms like sea urchins. Results We evaluate the use of chromatin accessibility, transcription and RNA Polymerase II for their ability to predict enhancer activity of genomic regions in sea urchin embryos. ATAC-seq, PRO-seq, and Pol II ChIP-seq from early and late blastula embryos are manually contrasted with experimental cis-regulatory analyses available in sea urchin embryos, with particular attention to common developmental regulatory elements known to have enhancer and silencer functions differentially deployed among embryonic territories. Using the three functional genomic data types, machine learning models are trained and tested to classify and quantitatively predict the enhancer activity of several hundred genomic regions previously validated with reporter constructs in vivo. Conclusions Overall, chromatin accessibility and transcription have substantial power for predicting enhancer activity. For promoter-overlapping cis-regulatory elements in particular, the distribution of Pol II is the best predictor of enhancer activity in blastula embryos. Furthermore, ATAC- and PRO-seq predictive value is stage dependent for the promoter-overlapping subset. This suggests that the sequence of regulatory mechanisms leading to transcriptional activation have distinct relevance at different levels of the developmental gene regulatory hierarchy deployed during embryogenesis. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07936-0.
Collapse
Affiliation(s)
- César Arenas-Mena
- College of Staten Island, The City University of New York (CUNY), Staten Island, NY, 10314, USA. .,Programs in Biology and Biochemistry, The Graduate Center, CUNY, New York, NY, 10016, USA.
| | - Sofija Miljovska
- Department of Biology, New York University, New York, NY, 10003, USA
| | - Edward J Rice
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, 14853, USA
| | - Justin Gurges
- College of Staten Island, The City University of New York (CUNY), Staten Island, NY, 10314, USA
| | - Tanvi Shashikant
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Zihe Wang
- College of Staten Island, The City University of New York (CUNY), Staten Island, NY, 10314, USA
| | - Sevinç Ercan
- Department of Biology, New York University, New York, NY, 10003, USA.,Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Charles G Danko
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, 14853, USA.,Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY, 14853, USA
| |
Collapse
|
46
|
Patel ZM, Hughes TR. Global properties of regulatory sequences are predicted by transcription factor recognition mechanisms. Genome Biol 2021; 22:285. [PMID: 34620190 PMCID: PMC8496038 DOI: 10.1186/s13059-021-02503-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 09/16/2021] [Indexed: 01/07/2023] Open
Abstract
Background Mammalian genomes contain millions of putative regulatory sequences, which are delineated by binding of multiple transcription factors. The degree to which spacing and orientation constraints among transcription factor binding sites contribute to the recognition and identity of regulatory sequence is an unresolved but important question that impacts our understanding of genome function and evolution. Global mechanisms that underlie phenomena including the size of regulatory sequences, their uniqueness, and their evolutionary turnover remain poorly described. Results Here, we ask whether models incorporating different degrees of spacing and orientation constraints among transcription factor binding sites are broadly consistent with several global properties of regulatory sequence. These properties include length, sequence diversity, turnover rate, and dominance of specific TFs in regulatory site identity and cell type specification. Models with and without spacing and orientation constraints are generally consistent with all observed properties of regulatory sequence, and with regulatory sequences being fundamentally small (~ 1 nucleosome). Uniqueness of regulatory regions and their rapid evolutionary turnover are expected under all models examined. An intriguing issue we identify is that the complexity of eukaryotic regulatory sites must scale with the number of active transcription factors, in order to accomplish observed specificity. Conclusions Models of transcription factor binding with or without spacing and orientation constraints predict that regulatory sequences should be fundamentally short, unique, and turn over rapidly. We posit that the existence of master regulators may be, in part, a consequence of evolutionary pressure to limit the complexity and increase evolvability of regulatory sites. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-021-02503-y.
Collapse
Affiliation(s)
- Zain M Patel
- Donnelly Centre for Cellular and Biomolecular Research and Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Timothy R Hughes
- Donnelly Centre for Cellular and Biomolecular Research and Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada.
| |
Collapse
|
47
|
Ray-Jones H, Spivakov M. Transcriptional enhancers and their communication with gene promoters. Cell Mol Life Sci 2021; 78:6453-6485. [PMID: 34414474 PMCID: PMC8558291 DOI: 10.1007/s00018-021-03903-w] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 07/08/2021] [Accepted: 07/19/2021] [Indexed: 12/13/2022]
Abstract
Transcriptional enhancers play a key role in the initiation and maintenance of gene expression programmes, particularly in metazoa. How these elements control their target genes in the right place and time is one of the most pertinent questions in functional genomics, with wide implications for most areas of biology. Here, we synthesise classic and recent evidence on the regulatory logic of enhancers, including the principles of enhancer organisation, factors that facilitate and delimit enhancer-promoter communication, and the joint effects of multiple enhancers. We show how modern approaches building on classic insights have begun to unravel the complexity of enhancer-promoter relationships, paving the way towards a quantitative understanding of gene control.
Collapse
Affiliation(s)
- Helen Ray-Jones
- MRC London Institute of Medical Sciences, London, W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College, London, W12 0NN, UK
| | - Mikhail Spivakov
- MRC London Institute of Medical Sciences, London, W12 0NN, UK.
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College, London, W12 0NN, UK.
| |
Collapse
|
48
|
Umarov R, Li Y, Arakawa T, Takizawa S, Gao X, Arner E. ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation. PLoS Comput Biol 2021; 17:e1009376. [PMID: 34491989 PMCID: PMC8448322 DOI: 10.1371/journal.pcbi.1009376] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 09/17/2021] [Accepted: 08/23/2021] [Indexed: 11/19/2022] Open
Abstract
Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are many attempts to develop computational promoter and enhancer identification methods, reliable tools to analyze long genomic sequences are still lacking. Prediction methods often perform poorly on the genome-wide scale because the number of negatives is much higher than that in the training sets. To address this issue, we propose a dynamic negative set updating scheme with a two-model approach, using one model for scanning the genome and the other one for testing candidate positions. The developed method achieves good genome-level performance and maintains robust performance when applied to other vertebrate species, without re-training. Moreover, the unannotated predicted regulatory regions made on the human genome are enriched for disease-associated variants, suggesting them to be potentially true regulatory elements rather than false positives. We validated high scoring "false positive" predictions using reporter assay and all tested candidates were successfully validated, demonstrating the ability of our method to discover novel human regulatory regions.
Collapse
Affiliation(s)
- Ramzan Umarov
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Japan
- * E-mail: (RU); (XG); (EA)
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Hong Kong, People’s Republic of China
| | - Takahiro Arakawa
- Laboratory for Applied Regulatory Genomics Network Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Satoshi Takizawa
- Laboratory for Applied Regulatory Genomics Network Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Xin Gao
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, Thuwal, Saudi Arabia
- * E-mail: (RU); (XG); (EA)
| | - Erik Arner
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Japan
- Laboratory for Applied Regulatory Genomics Network Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- * E-mail: (RU); (XG); (EA)
| |
Collapse
|
49
|
Friedman RZ, Granas DM, Myers CA, Corbo JC, Cohen BA, White MA. Information content differentiates enhancers from silencers in mouse photoreceptors. eLife 2021; 10:67403. [PMID: 34486522 PMCID: PMC8492058 DOI: 10.7554/elife.67403] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 09/03/2021] [Indexed: 12/12/2022] Open
Abstract
Enhancers and silencers often depend on the same transcription factors (TFs) and are conflated in genomic assays of TF binding or chromatin state. To identify sequence features that distinguish enhancers and silencers, we assayed massively parallel reporter libraries of genomic sequences targeted by the photoreceptor TF cone-rod homeobox (CRX) in mouse retinas. Both enhancers and silencers contain more TF motifs than inactive sequences, but relative to silencers, enhancers contain motifs from a more diverse collection of TFs. We developed a measure of information content that describes the number and diversity of motifs in a sequence and found that, while both enhancers and silencers depend on CRX motifs, enhancers have higher information content. The ability of information content to distinguish enhancers and silencers targeted by the same TF illustrates how motif context determines the activity of cis-regulatory sequences. Different cell types are established by activating and repressing the activity of specific sets of genes, a process controlled by proteins called transcription factors. Transcription factors work by recognizing and binding short stretches of DNA in parts of the genome called cis-regulatory sequences. A cis-regulatory sequence that increases the activity of a gene when bound by transcription factors is called an enhancer, while a sequence that causes a decrease in gene activity is called a silencer. To establish a cell type, a particular transcription factor will act on both enhancers and silencers that control the activity of different genes. For example, the transcription factor cone-rod homeobox (CRX) is critical for specifying different types of cells in the retina, and it acts on both enhancers and silencers. In rod photoreceptors, CRX activates rod genes by binding their enhancers, while repressing cone photoreceptor genes by binding their silencers. However, CRX always recognizes and binds to the same DNA sequence, known as its binding site, making it unclear why some cis-regulatory sequences bound to CRX act as silencers, while others act as enhancers. Friedman et al. sought to understand how enhancers and silencers, both bound by CRX, can have different effects on the genes they control. Since both enhancers and silencers contain CRX binding sites, the difference between the two must lie in the sequence of the DNA surrounding these binding sites. Using retinas that have been explanted from mice and kept alive in the laboratory, Friedman et al. tested the activity of thousands of CRX-binding sequences from the mouse genome. This showed that both enhancers and silencers have more copies of CRX-binding sites than sequences of the genome that are inactive. Additionally, the results revealed that enhancers have a diverse collection of binding sites for other transcription factors, while silencers do not. Friedman et al. developed a new metric they called information content, which captures the diverse combinations of different transcription binding sites that cis-regulatory sequences can have. Using this metric, Friedman et al. showed that it is possible to distinguish enhancers from silencers based on their information content. It is critical to understand how the DNA sequences of cis-regulatory regions determine their activity, because mutations in these regions of the genome can cause disease. However, since every person has thousands of benign mutations in cis-regulatory sequences, it is a challenge to identify specific disease-causing mutations, which are relatively rare. One long-term goal of models of enhancers and silencers, such as Friedman et al.’s information content model, is to understand how mutations can affect cis-regulatory sequences, and, in some cases, lead to disease.
Collapse
Affiliation(s)
- Ryan Z Friedman
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, United States.,Department of Genetics, Washington University School of Medicine, St. Louis, United States
| | - David M Granas
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, United States.,Department of Genetics, Washington University School of Medicine, St. Louis, United States
| | - Connie A Myers
- Department of Pathology and Immunology, Washington University School of Medicine, St Louis, United States
| | - Joseph C Corbo
- Department of Pathology and Immunology, Washington University School of Medicine, St Louis, United States
| | - Barak A Cohen
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, United States.,Department of Genetics, Washington University School of Medicine, St. Louis, United States
| | - Michael A White
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, United States.,Department of Genetics, Washington University School of Medicine, St. Louis, United States
| |
Collapse
|
50
|
ChIP-GSM: Inferring active transcription factor modules to predict functional regulatory elements. PLoS Comput Biol 2021; 17:e1009203. [PMID: 34292930 PMCID: PMC8330942 DOI: 10.1371/journal.pcbi.1009203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 08/03/2021] [Accepted: 06/20/2021] [Indexed: 11/19/2022] Open
Abstract
Transcription factors (TFs) often function as a module including both master factors and mediators binding at cis-regulatory regions to modulate nearby gene transcription. ChIP-seq profiling of multiple TFs makes it feasible to infer functional TF modules. However, when inferring TF modules based on co-localization of ChIP-seq peaks, often many weak binding events are missed, especially for mediators, resulting in incomplete identification of modules. To address this problem, we develop a ChIP-seq data-driven Gibbs Sampler to infer Modules (ChIP-GSM) using a Bayesian framework that integrates ChIP-seq profiles of multiple TFs. ChIP-GSM samples read counts of module TFs iteratively to estimate the binding potential of a module to each region and, across all regions, estimates the module abundance. Using inferred module-region probabilistic bindings as feature units, ChIP-GSM then employs logistic regression to predict active regulatory elements. Validation of ChIP-GSM predicted regulatory regions on multiple independent datasets sharing the same context confirms the advantage of using TF modules for predicting regulatory activity. In a case study of K562 cells, we demonstrate that the ChIP-GSM inferred modules form as groups, activate gene expression at different time points, and mediate diverse functional cellular processes. Hence, ChIP-GSM infers biologically meaningful TF modules and improves the prediction accuracy of regulatory region activities.
Collapse
|