1
|
Comendul A, Ruf-Zamojski F, Ford CT, Agarwal P, Zaslavsky E, Nudelman G, Hariharan M, Rubenstein A, Pincas H, Nair VD, Michaleas AM, Fremont-Smith PD, Ricke DO, Sealfon SC, Woods CW, Claypool KT, Jaimes R. Comprehensive guide for epigenetics and transcriptomics data quality control. STAR Protoc 2025; 6:103607. [PMID: 39869481 PMCID: PMC11799959 DOI: 10.1016/j.xpro.2025.103607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 09/27/2024] [Accepted: 01/07/2025] [Indexed: 01/29/2025] Open
Abstract
Host response to environmental exposures such as pathogens and chemicals can include modifications to the epigenome and transcriptome. Improved signature discovery, including the identification of the agent and timing of exposure, has been enabled by advancements in assaying techniques to detect RNA expression, DNA base modifications, histone modifications, and chromatin accessibility. The interrogation of the epigenome and transcriptome cascade requires analyzing disparate datasets from multiple assay types, often at single-cell resolution, derived from the same biospecimen. However, there remains a paucity of rigorous quality control standards of those datasets that reflect quality assurance of the underlying assay. This guide outlines a comprehensive suite of metrics that can be used to ensure quality from 11 different epigenetics and transcriptomics assays. Recommended mitigative actions to address failed metrics are provided. The workflow presented aims to improve benchwork protocols and dataset quality to enable accurate discovery of exposure signatures.
Collapse
Affiliation(s)
- Arianna Comendul
- Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, USA
| | - Frederique Ruf-Zamojski
- Cedars-Sinai Medical Center, Department of Medicine, Los Angeles, CA, USA; Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Colby T Ford
- Tuple LLC, Charlotte, NC, USA; University of North Carolina at Charlotte, Department of Bioinformatics and Genomics, Charlotte, NC, USA; University of North Carolina at Charlotte, Center for Computational Intelligence to Predict Health and Environmental Risks (CIPHER), Charlotte, NC, USA
| | | | | | | | - Manoj Hariharan
- Genomic Analysis Laboratory, Salk Institute, La Jolla, CA, USA
| | | | - Hanna Pincas
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | - Adam M Michaleas
- Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, USA
| | | | - Darrell O Ricke
- Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, USA
| | | | | | - Kajal T Claypool
- Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, USA
| | - Rafael Jaimes
- Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, USA.
| |
Collapse
|
2
|
Manghrani A, Rangadurai AK, Szekely O, Liu B, Guseva S, Al-Hashimi HM. Quantitative and Systematic NMR Measurements of Sequence-Dependent A-T Hoogsteen Dynamics in the DNA Double Helix. Biochemistry 2025; 64:1042-1054. [PMID: 39982856 DOI: 10.1021/acs.biochem.4c00820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2025]
Abstract
The dynamic properties of DNA depend on the sequence, providing an important source of sequence-specificity in biochemical reactions. However, comprehensively measuring how these dynamics vary with sequence is challenging, especially when they involve lowly populated and short-lived conformational states. Using 1H CEST supplemented by targeted 13C R1ρ NMR experiments, we quantitatively measured Watson-Crick to Hoogsteen dynamics for an A-T base pair in 13 trinucleotide sequence contexts. The Hoogsteen population and exchange rate varied 4-fold and 16-fold, respectively, and were dependent on both the 3'- and 5'-neighbors but only weakly dependent on monovalent ion concentration (25 versus 100 mM NaCl) and pH (6.8 versus 8.0). Flexible TA and CA dinucleotide steps exhibited the highest Hoogsteen populations, and their kinetics rates strongly depended on the 3'-neighbor. In contrast, the stiffer AA and GA steps had the lowest Hoogsteen population, and their kinetics were weakly dependent on the 3'-neighbor. The Hoogsteen lifetime was especially short when G-C neighbors flanked the A-T base pair. Our results uncover a unique conformational basis for sequence-specificity in the DNA double helix and establish the utility of NMR to quantitatively and comprehensively measure sequence-dependent DNA dynamics.
Collapse
Affiliation(s)
- Akanksha Manghrani
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27705, United States
| | - Atul Kaushik Rangadurai
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27705, United States
- Program in Molecular Medicine, Hospital for Sick Children Research Institute, Toronto, Ontario M5G 0A4, Canada
| | - Or Szekely
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27705, United States
| | - Bei Liu
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27705, United States
| | - Serafima Guseva
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, United States
| | - Hashim M Al-Hashimi
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, United States
| |
Collapse
|
3
|
Song X, Pan Z, Zhang Y, Yang W, Zhang T, Wang H, Chen Y, Yu X, Ding H, Li R, Ge P, Xu L, Dong G, Jiang F. Excessive MYC Orchestrates Macrophages induced Chromatin Remodeling to Sustain Micropapillary-Patterned Malignancy in Lung Adenocarcinoma. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2403851. [PMID: 39899538 PMCID: PMC11948069 DOI: 10.1002/advs.202403851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 01/15/2025] [Indexed: 02/05/2025]
Abstract
Current understanding of micropapillary (MP)-subtype lung adenocarcinoma (LUAD) remains confined to biological activities and genomic landscapes. Unraveling the major regulatory programs underlying MP patterned malignancy offers opportunities to identify more feasible therapeutic targets for patients with MP LUAD. This study shows that patients with MP subtype LUAD have aberrant activation of the MYC pathway compared to patients with other subtypes. In vitro and xenograft mouse model studies reveal that MP pattern in malignancy cannot be solely due to aberrant MYC expression but requires the involvement of M2-like macrophages. Excessively expressed MYC leads to the accumulation of M2-like macrophages from the bone marrow, which secretes TGFβ, to induce the expression of FOSL2 in tumor cells, thereby remodeling chromatin accessibility at promoter regions of MP-pattern genes to promote the MYC-mediated de novo transcriptional regulation of these genes. Additionally, the MP-pattern in malignancy can be effectively alleviated by disrupting the TGFβ-FOSL2 axis. These findings reveal new functions for the M2-like macrophage-TGFβ-FOSL2 axis in MYC-overexpressing MP-subtype LUAD, identifying targetable vulnerabilities in this pathway.
Collapse
Affiliation(s)
- Xuming Song
- Department of Thoracic SurgeryNanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer ResearchNanjing210009P. R. China
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
- The Fourth Clinical College of Nanjing Medical UniversityNanjing210000P. R. China
| | - Zehao Pan
- Department of Thoracic SurgeryNanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer ResearchNanjing210009P. R. China
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
- The Fourth Clinical College of Nanjing Medical UniversityNanjing210000P. R. China
| | - Yi Zhang
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
- Department of PathologyNanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer ResearchNanjing210009P. R. China
| | - Wenmin Yang
- Department of Thoracic SurgeryNanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer ResearchNanjing210009P. R. China
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
- Department of PathologyNanjing Drum Tower hospitalNanjing210008P.R. China
| | - Te Zhang
- Department of Thoracic SurgeryNanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer ResearchNanjing210009P. R. China
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
- Department of Biochemistry and Molecular GeneticsFeinberg School of MedicineNorthwestern UniversityChicagoIllinois60201USA
| | - Hui Wang
- Department of Thoracic SurgeryNanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer ResearchNanjing210009P. R. China
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
- The Fourth Clinical College of Nanjing Medical UniversityNanjing210000P. R. China
| | - Yuzhong Chen
- Department of Thoracic SurgeryNanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer ResearchNanjing210009P. R. China
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
- The Fourth Clinical College of Nanjing Medical UniversityNanjing210000P. R. China
| | - Xinnian Yu
- Department of Thoracic SurgeryNanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer ResearchNanjing210009P. R. China
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
- The Fourth Clinical College of Nanjing Medical UniversityNanjing210000P. R. China
| | - Hanlin Ding
- Department of Thoracic SurgeryNanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer ResearchNanjing210009P. R. China
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
- The Fourth Clinical College of Nanjing Medical UniversityNanjing210000P. R. China
| | - Rutao Li
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
- Department of Thoracic SurgeryThe Fourth Affiliated Hospital of Soochow UniversityNanjing215000P. R. China
| | - Pengfei Ge
- Department of Thoracic SurgeryNanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer ResearchNanjing210009P. R. China
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
- The Fourth Clinical College of Nanjing Medical UniversityNanjing210000P. R. China
| | - Lin Xu
- Department of Thoracic SurgeryNanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer ResearchNanjing210009P. R. China
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
- Collaborative Innovation Center for Cancer Personalized MedicineNanjing Medical UniversityNanjing211116P. R. China
| | - Gaochao Dong
- Department of Thoracic SurgeryNanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer ResearchNanjing210009P. R. China
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
| | - Feng Jiang
- Department of Thoracic SurgeryNanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer ResearchNanjing210009P. R. China
- Jiangsu Key Laboratory of Molecular and Translational Cancer ResearchCancer Institute of Jiangsu ProvinceNanjing210000P. R. China
| |
Collapse
|
4
|
Li J, Zhang P, Xi X, Liu L, Wei L, Wang X. Modeling and designing enhancers by introducing and harnessing transcription factor binding units. Nat Commun 2025; 16:1469. [PMID: 39922842 PMCID: PMC11807178 DOI: 10.1038/s41467-025-56749-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Accepted: 01/24/2025] [Indexed: 02/10/2025] Open
Abstract
Enhancers serve as pivotal regulators of gene expression throughout various biological processes by interacting with transcription factors (TFs). While transcription factor binding sites (TFBSs) are widely acknowledged as key determinants of TF binding and enhancer activity, the significant role of their surrounding context sequences remains to be quantitatively characterized. Here we propose the concept of transcription factor binding unit (TFBU) to modularly model enhancers by quantifying the impact of context sequences surrounding TFBSs using deep learning models. Based on this concept, we develop DeepTFBU, a comprehensive toolkit for enhancer design. We demonstrate that designing TFBS context sequences can significantly modulate enhancer activities and produce cell type-specific responses. DeepTFBU is also highly efficient in the de novo design of enhancers containing multiple TFBSs. Furthermore, DeepTFBU enables flexible decoupling and optimization of generalized enhancers. We prove that TFBU is a crucial concept, and DeepTFBU is highly effective for rational enhancer design.
Collapse
Affiliation(s)
- Jiaqi Li
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China
| | - Pengcheng Zhang
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China
| | - Xi Xi
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China
| | - Liyang Liu
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China
| | - Lei Wei
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China.
| |
Collapse
|
5
|
Schroeder JW, Wolfe MB, Freddolino L. ShapeME: A tool and web front-end for de novo discovery of structural motifs underpinning protein-DNA interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.28.635290. [PMID: 39975017 PMCID: PMC11838363 DOI: 10.1101/2025.01.28.635290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Determining where transcriptional regulators bind within a genome is paramount to understanding how gene expression is regulated. Historically, position weight matrices (PWMs) have been used to define the binding preferences of DNA binding proteins1. However, PWMs treat the identity of each base in a sequence as an independent and additive measure of binding preference, which can limit their utility2. Models that consider higher order interactions between nearby bases yield greater success in predicting proteins' binding to DNA, but for many proteins there is still substantial room for improvement in predicting and understanding the determinants of proteins' binding to DNA3. In addition to DNA sequence motifs, structural motifs (e.g., a narrow minor groove width) are important determinants of binding for some DNA-binding proteins4. Despite the initial success of algorithms using structural features of DNA to predict binding properties of proteins from either ChIP-seq or SELEX data5-8, there remains a need for a de novo structural motif discovery framework which can be applied to data from a variety of experimental designs. Here, we present a unified workflow, capable of utilizing virtually any type of data representing sequence coverage or enrichment (e.g. ChIP-seq, RNA-seq, SELEX, etc.), to discover short structural motifs with explanatory power for a protein's DNA binding preference. We couple the DNAshapeR algorithm9 with our own information-theoretic approach to de novo motif discovery, and wrap shape and sequence motif inference and model selection into a single tool called ShapeME. Application of our structural motif discovery algorithm to proteins with ChIP-seq data in ENCODE datasets reveals a subset of proteins where short structural motifs outperform the best PWM for that protein as determined from the JASPAR database, or as identified by the sequence motif elicitation tool STREME. Our approach offers a powerful and versatile framework for inferring structural DNA binding motifs, and will complement current sequence-based motif elicitation tools in discovery of protein-DNA interaction principles. A web-based interface to ShapeME is available at https://seq2fun.dcmb.med.umich.edu/shapeme, with full source code available at https://github.com/freddolino-lab/ShapeME.
Collapse
Affiliation(s)
- Jeremy W. Schroeder
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Michael B. Wolfe
- Department of Biochemistry, University of Wisconsin - Madison, Madison, WI 53706, USA
| | - Lydia Freddolino
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
6
|
Awdeh A, Turcotte M, Perkins TJ. Identifying transcription factors with cell-type specific DNA binding signatures. BMC Genomics 2024; 25:957. [PMID: 39402535 PMCID: PMC11472444 DOI: 10.1186/s12864-024-10859-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 10/02/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Transcription factors (TFs) bind to different parts of the genome in different types of cells, but it is usually assumed that the inherent DNA-binding preferences of a TF are invariant to cell type. Yet, there are several known examples of TFs that switch their DNA-binding preferences in different cell types, and yet more examples of other mechanisms, such as steric hindrance or cooperative binding, that may result in a "DNA signature" of differential binding. RESULTS To survey this phenomenon systematically, we developed a deep learning method we call SigTFB (Signatures of TF Binding) to detect and quantify cell-type specificity in a TF's known genomic binding sites. We used ENCODE ChIP-seq data to conduct a wide scale investigation of 169 distinct TFs in up to 14 distinct cell types. SigTFB detected statistically significant DNA binding signatures in approximately two-thirds of TFs, far more than might have been expected from the relatively sparse evidence in prior literature. We found that the presence or absence of a cell-type specific DNA binding signature is distinct from, and indeed largely uncorrelated to, the degree of overlap between ChIP-seq peaks in different cell types, and tended to arise by two mechanisms: using established motifs in different frequencies, and by selective inclusion of motifs for distint TFs. CONCLUSIONS While recent results have highlighted cell state features such as chromatin accessibility and gene expression in predicting TF binding, our results emphasize that, for some TFs, the DNA sequences of the binding sites contain substantial cell-type specific motifs.
Collapse
Affiliation(s)
- Aseel Awdeh
- School of Electrical Engineering and Compute Science, University of Ottawa, 800 King Edward Ave., Ottawa, K1N 6N5, Ontario, Canada
- Regenerative Medicine Program, Ottawa Hospital Research Institute, 501 Smyth Rd., Ottawa, K1H 8L6, Ontario, Canada
| | - Marcel Turcotte
- School of Electrical Engineering and Compute Science, University of Ottawa, 800 King Edward Ave., Ottawa, K1N 6N5, Ontario, Canada
| | - Theodore J Perkins
- School of Electrical Engineering and Compute Science, University of Ottawa, 800 King Edward Ave., Ottawa, K1N 6N5, Ontario, Canada.
- Regenerative Medicine Program, Ottawa Hospital Research Institute, 501 Smyth Rd., Ottawa, K1H 8L6, Ontario, Canada.
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Rd., Ottawa, K1H 8M5, Ontario, Canada.
| |
Collapse
|
7
|
Xu C, Kleinschmidt H, Yang J, Leith EM, Johnson J, Tan S, Mahony S, Bai L. Systematic dissection of sequence features affecting binding specificity of a pioneer factor reveals binding synergy between FOXA1 and AP-1. Mol Cell 2024; 84:2838-2855.e10. [PMID: 39019045 PMCID: PMC11334613 DOI: 10.1016/j.molcel.2024.06.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/23/2024] [Accepted: 06/21/2024] [Indexed: 07/19/2024]
Abstract
Despite the unique ability of pioneer factors (PFs) to target nucleosomal sites in closed chromatin, they only bind a small fraction of their genomic motifs. The underlying mechanism of this selectivity is not well understood. Here, we design a high-throughput assay called chromatin immunoprecipitation with integrated synthetic oligonucleotides (ChIP-ISO) to systematically dissect sequence features affecting the binding specificity of a classic PF, FOXA1, in human A549 cells. Combining ChIP-ISO with in vitro and neural network analyses, we find that (1) FOXA1 binding is strongly affected by co-binding transcription factors (TFs) AP-1 and CEBPB; (2) FOXA1 and AP-1 show binding cooperativity in vitro; (3) FOXA1's binding is determined more by local sequences than chromatin context, including eu-/heterochromatin; and (4) AP-1 is partially responsible for differential binding of FOXA1 in different cell types. Our study presents a framework for elucidating genetic rules underlying PF binding specificity and reveals a mechanism for context-specific regulation of its binding.
Collapse
Affiliation(s)
- Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Jianyu Yang
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Erik M Leith
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Jenna Johnson
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Song Tan
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA; Department of Physics, The Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
8
|
Stoeber S, Godin H, Xu C, Bai L. Pioneer factors: nature or nurture? Crit Rev Biochem Mol Biol 2024; 59:139-153. [PMID: 38778580 PMCID: PMC11444900 DOI: 10.1080/10409238.2024.2355885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 04/30/2024] [Accepted: 05/13/2024] [Indexed: 05/25/2024]
Abstract
Chromatin is densely packed with nucleosomes, which limits the accessibility of many chromatin-associated proteins. Pioneer factors (PFs) are usually viewed as a special group of sequence-specific transcription factors (TFs) that can recognize nucleosome-embedded motifs, invade compact chromatin, and generate open chromatin regions. Through this process, PFs initiate a cascade of events that play key roles in gene regulation and cell differentiation. A current debate in the field is if PFs belong to a unique subset of TFs with intrinsic "pioneering activity", or if all TFs have the potential to function as PFs within certain cellular contexts. There are also different views regarding the key feature(s) that define pioneering activity. In this review, we present evidence from the literature related to these alternative views and discuss how to potentially reconcile them. It is possible that both intrinsic properties, like tight nucleosome binding and structural compatibility, and cellular conditions, like concentration and co-factor availability, are important for PF function.
Collapse
Affiliation(s)
- Shane Stoeber
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Holly Godin
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Physics, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
9
|
Manghrani A, Rangadurai AK, Szekely O, Liu B, Guseva S, Al-Hashimi HM. Quantitative and systematic NMR measurements of sequence-dependent A-T Hoogsteen dynamics uncovers unique conformational specificity in the DNA double helix. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594415. [PMID: 38798635 PMCID: PMC11118333 DOI: 10.1101/2024.05.15.594415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
The propensities to form lowly-populated short-lived conformations of DNA could vary with sequence, providing an important source of sequence-specificity in biochemical reactions. However, comprehensively measuring how these dynamics vary with sequence is challenging. Using 1H CEST and 13C R 1 ρ NMR, we measured Watson-Crick to Hoogsteen dynamics for an A-T base pair in thirteen trinucleotide sequence contexts. The Hoogsteen population and exchange rate varied 4-fold and 16-fold, respectively, and were dependent on both the 3'- and 5'-neighbors but only weakly dependent on monovalent ion concentration (25 versus 100 mM NaCl) and pH (6.8 versus 8.0). Flexible TA and CA dinucleotide steps exhibited the highest Hoogsteen populations, and their kinetics rates strongly depended on the 3'-neighbor. In contrast, the stiffer AA and GA steps had the lowest Hoogsteen population, and their kinetics were weakly dependent on the 3'-neighbor. The Hoogsteen lifetime was especially short when G-C neighbors flanked the A-T base pair. The Hoogsteen dynamics had a distinct sequence-dependence compared to duplex stability and minor groove width. Thus, our results uncover a unique source of sequence-specificity hidden within the DNA double helix in the form of A-T Hoogsteen dynamics and establish the utility of 1H CEST to quantitively measure sequence-dependent DNA dynamics.
Collapse
Affiliation(s)
- Akanksha Manghrani
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27705, United States
| | - Atul Kaushik Rangadurai
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27705, United States
- Program in Molecular Medicine, Hospital for Sick Children Research Institute, Toronto, ON, M5G 0A4, Canada
| | - Or Szekely
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27705, United States
| | - Bei Liu
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27705, United States
| | - Serafima Guseva
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, United States
| | - Hashim M. Al-Hashimi
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, United States
| |
Collapse
|
10
|
Schultheis H, Bentsen M, Heger V, Looso M. Uncovering uncharacterized binding of transcription factors from ATAC-seq footprinting data. Sci Rep 2024; 14:9275. [PMID: 38654130 DOI: 10.1038/s41598-024-59989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 04/17/2024] [Indexed: 04/25/2024] Open
Abstract
Transcription factors (TFs) are crucial epigenetic regulators, which enable cells to dynamically adjust gene expression in response to environmental signals. Computational procedures like digital genomic footprinting on chromatin accessibility assays such as ATACseq can be used to identify bound TFs in a genome-wide scale. This method utilizes short regions of low accessibility signals due to steric hindrance of DNA bound proteins, called footprints (FPs), which are combined with motif databases for TF identification. However, while over 1600 TFs have been described in the human genome, only ~ 700 of these have a known binding motif. Thus, a substantial number of FPs without overlap to a known DNA motif are normally discarded from FP analysis. In addition, the FP method is restricted to organisms with a substantial number of known TF motifs. Here we present DENIS (DE Novo motIf diScovery), a framework to generate and systematically investigate the potential of de novo TF motif discovery from FPs. DENIS includes functionality (1) to isolate FPs without binding motifs, (2) to perform de novo motif generation and (3) to characterize novel motifs. Here, we show that the framework rediscovers artificially removed TF motifs, quantifies de novo motif usage during an early embryonic development example dataset, and is able to analyze and uncover TF activity in organisms lacking canonical motifs. The latter task is exemplified by an investigation of a scATAC-seq dataset in zebrafish which covers different cell types during hematopoiesis.
Collapse
Affiliation(s)
- Hendrik Schultheis
- Bioinformatics Core Unit (BCU), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Mette Bentsen
- Bioinformatics Core Unit (BCU), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Vanessa Heger
- Bioinformatics Core Unit (BCU), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Mario Looso
- Bioinformatics Core Unit (BCU), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.
- Cardio-Pulmonary Institute (CPI), Bad Nauheim, Germany.
| |
Collapse
|
11
|
Liao Z, Tang S, Nozawa K, Shimada K, Ikawa M, Monsivais D, Matzuk M. Affinity-tagged SMAD1 and SMAD5 mouse lines reveal transcriptional reprogramming mechanisms during early pregnancy. eLife 2024; 12:RP91434. [PMID: 38536963 PMCID: PMC10972565 DOI: 10.7554/elife.91434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024] Open
Abstract
Endometrial decidualization, a prerequisite for successful pregnancies, relies on transcriptional reprogramming driven by progesterone receptor (PR) and bone morphogenetic protein (BMP)-SMAD1/SMAD5 signaling pathways. Despite their critical roles in early pregnancy, how these pathways intersect in reprogramming the endometrium into a receptive state remains unclear. To define how SMAD1 and/or SMAD5 integrate BMP signaling in the uterus during early pregnancy, we generated two novel transgenic mouse lines with affinity tags inserted into the endogenous SMAD1 and SMAD5 loci (Smad1HA/HA and Smad5PA/PA). By profiling the genome-wide distribution of SMAD1, SMAD5, and PR in the mouse uterus, we demonstrated the unique and shared roles of SMAD1 and SMAD5 during the window of implantation. We also showed the presence of a conserved SMAD1, SMAD5, and PR genomic binding signature in the uterus during early pregnancy. To functionally characterize the translational aspects of our findings, we demonstrated that SMAD1/5 knockdown in human endometrial stromal cells suppressed expressions of canonical decidual markers (IGFBP1, PRL, FOXO1) and PR-responsive genes (RORB, KLF15). Here, our studies provide novel tools to study BMP signaling pathways and highlight the fundamental roles of SMAD1/5 in mediating both BMP signaling pathways and the transcriptional response to progesterone (P4) during early pregnancy.
Collapse
Affiliation(s)
- Zian Liao
- Department of Pathology & Immunology, Baylor College of MedicineHoustonUnited States
- Graduate Program of Genetics and Genomics, Baylor College of MedicineHoustonUnited States
- Department of Molecular and Human Genetics, Baylor College of MedicineHoustonUnited States
- Center for Drug Discovery, Baylor College of MedicineHoustonUnited States
| | - Suni Tang
- Department of Pathology & Immunology, Baylor College of MedicineHoustonUnited States
- Department of Molecular and Human Genetics, Baylor College of MedicineHoustonUnited States
| | - Kaori Nozawa
- Department of Pathology & Immunology, Baylor College of MedicineHoustonUnited States
- Center for Drug Discovery, Baylor College of MedicineHoustonUnited States
| | - Keisuke Shimada
- Research Institute for Microbial Diseases, Osaka UniversityOsakaJapan
| | - Masahito Ikawa
- Research Institute for Microbial Diseases, Osaka UniversityOsakaJapan
| | - Diana Monsivais
- Department of Pathology & Immunology, Baylor College of MedicineHoustonUnited States
- Center for Drug Discovery, Baylor College of MedicineHoustonUnited States
| | - Martin Matzuk
- Department of Pathology & Immunology, Baylor College of MedicineHoustonUnited States
- Graduate Program of Genetics and Genomics, Baylor College of MedicineHoustonUnited States
- Department of Molecular and Human Genetics, Baylor College of MedicineHoustonUnited States
- Center for Drug Discovery, Baylor College of MedicineHoustonUnited States
| |
Collapse
|
12
|
Liao Z, Tang S, Nozawa K, Shimada K, Ikawa M, Monsivais D, Matzuk MM. Affinity-tagged SMAD1 and SMAD5 mouse lines reveal transcriptional reprogramming mechanisms during early pregnancy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.25.559321. [PMID: 38106095 PMCID: PMC10723262 DOI: 10.1101/2023.09.25.559321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Endometrial decidualization, a prerequisite for successful pregnancies, relies on transcriptional reprogramming driven by progesterone receptor (PR) and bone morphogenetic protein (BMP)-SMAD1/SMAD5 signaling pathways. Despite their critical roles in early pregnancy, how these pathways intersect in reprogramming the endometrium into a receptive state remains unclear. To define how SMAD1 and/or SMAD5 integrate BMP signaling in the uterus during early pregnancy, we generated two novel transgenic mouse lines with affinity tags inserted into the endogenous SMAD1 and SMAD5 loci (Smad1HA/HA and Smad5PA/PA). By profiling the genome-wide distribution of SMAD1, SMAD5, and PR in the mouse uterus, we demonstrated the unique and shared roles of SMAD1 and SMAD5 during the window of implantation. We also showed the presence of a conserved SMAD1, SMAD5, and PR genomic binding signature in the uterus during early pregnancy. To functionally characterize the translational aspects of our findings, we demonstrated that SMAD1/5 knockdown in human endometrial stromal cells suppressed expressions of canonical decidual markers (IGFBP1, PRL, FOXO1) and PR-responsive genes (RORB, KLF15). Here, our studies provide novel tools to study BMP signaling pathways and highlight the fundamental roles of SMAD1/5 in mediating both BMP signaling pathways and the transcriptional response to progesterone (P4) during early pregnancy.
Collapse
Affiliation(s)
- Zian Liao
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
- Graduate Program of Genetics and Genomics, Baylor College of Medicine, Houston, TX, 77030, USA
- Center for Drug Discovery, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Suni Tang
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
- Center for Drug Discovery, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Kaori Nozawa
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
- Center for Drug Discovery, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Keisuke Shimada
- Research Institute for Microbial Diseases, Osaka University, Osaka, 565-0871, Japan
| | - Masahito Ikawa
- Research Institute for Microbial Diseases, Osaka University, Osaka, 565-0871, Japan
| | - Diana Monsivais
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
- Center for Drug Discovery, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Martin M. Matzuk
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
- Graduate Program of Genetics and Genomics, Baylor College of Medicine, Houston, TX, 77030, USA
- Center for Drug Discovery, Baylor College of Medicine, Houston, TX, 77030, USA
| |
Collapse
|
13
|
Mancheno-Ferris A, Immarigeon C, Rivero A, Depierre D, Schickele N, Fosseprez O, Chanard N, Aughey G, Lhoumaud P, Anglade J, Southall T, Plaza S, Payre F, Cuvier O, Polesello C. Crosstalk between chromatin and Shavenbaby defines transcriptional output along the Drosophila intestinal stem cell lineage. iScience 2024; 27:108624. [PMID: 38174321 PMCID: PMC10762455 DOI: 10.1016/j.isci.2023.108624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 07/05/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024] Open
Abstract
The transcription factor Shavenbaby (Svb), the only member of the OvoL family in Drosophila, controls the fate of various epithelial embryonic cells and adult stem cells. Post-translational modification of Svb produces two protein isoforms, Svb-ACT and Svb-REP, which promote adult intestinal stem cell renewal or differentiation, respectively. To define Svb mode of action, we used engineered cell lines and develop an unbiased method to identify Svb target genes across different contexts. Within a given cell type, Svb-ACT and Svb-REP antagonistically regulate the expression of a set of target genes, binding specific enhancers whose accessibility is constrained by chromatin landscape. Reciprocally, Svb-REP can influence local chromatin marks of active enhancers to help repressing target genes. Along the intestinal lineage, the set of Svb target genes progressively changes, together with chromatin accessibility. We propose that Svb-ACT-to-REP transition promotes enterocyte differentiation of intestinal stem cells through direct gene regulation and chromatin remodeling.
Collapse
Affiliation(s)
- Alexandra Mancheno-Ferris
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Clément Immarigeon
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Alexia Rivero
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - David Depierre
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Naomi Schickele
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Olivier Fosseprez
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Nicolas Chanard
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Gabriel Aughey
- Imperial College London, Sir Ernst Chain Building, South Kensington Campus, London SW7 2AZ, UK
| | - Priscilla Lhoumaud
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
- Institut Jacques Monod, Université Paris Cité/CNRS, 15 rue Hélène Brion, 75205 Paris Cedex 13, France
| | - Julien Anglade
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Tony Southall
- Imperial College London, Sir Ernst Chain Building, South Kensington Campus, London SW7 2AZ, UK
| | - Serge Plaza
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Laboratoire de Recherche en Sciences Végétales, CNRS/UPS/INPT, 31320 Auzeville-Tolosane, France
| | - François Payre
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Olivier Cuvier
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Cédric Polesello
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| |
Collapse
|
14
|
Yang Z, Li X, Sheng L, Zhu M, Lan X, Gu F. Multiomics-integrated deep language model enables in silico genome-wide detection of transcription factor binding site in unexplored biosamples. Bioinformatics 2024; 40:btae013. [PMID: 38216534 PMCID: PMC10812877 DOI: 10.1093/bioinformatics/btae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 12/07/2023] [Accepted: 01/11/2024] [Indexed: 01/14/2024] Open
Abstract
MOTIVATION Transcription factor binding sites (TFBS) are regulatory elements that have significant impact on transcription regulation and cell fate determination. Canonical motifs, biological experiments, and computational methods have made it possible to discover TFBS. However, most existing in silico TFBS prediction models are solely DNA-based, and are trained and utilized within the same biosample, which fail to infer TFBS in experimentally unexplored biosamples. RESULTS Here, we propose TFBS prediction by modified TransFormer (TFTF), a multimodal deep language architecture which integrates multiomics information in epigenetic studies. In comparison to existing computational techniques, TFTF has state-of-the-art accuracy, and is also the first approach to accurately perform genome-wide detection for cell-type and species-specific TFBS in experimentally unexplored biosamples. Compared to peak calling methods, TFTF consistently discovers true TFBS in threshold tuning-free way, with higher recalled rates. The underlying mechanism of TFTF reveals greater attention to the targeted TF's motif region in TFBS, and general attention to the entire peak region in non-TFBS. TFTF can benefit from the integration of broader and more diverse data for improvement and can be applied to multiple epigenetic scenarios. AVAILABILITY AND IMPLEMENTATION We provide a web server (https://tftf.ibreed.cn/) for users to utilize TFTF model. Users can train TFTF model and discover TFBS with their own data.
Collapse
Affiliation(s)
- Zikun Yang
- Damo Academy, Alibaba Group, Hangzhou 310023, China
- Hupan Lab, Hangzhou 310023, China
| | - Xin Li
- Damo Academy, Alibaba Group, Hangzhou 310023, China
- Hupan Lab, Hangzhou 310023, China
| | - Lele Sheng
- Damo Academy, Alibaba Group, Hangzhou 310023, China
- Hupan Lab, Hangzhou 310023, China
| | - Ming Zhu
- Department of Basic Medical Science, School of Medicine, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Joint Center for Life Sciences, Tsinghua University, Beijing 100084, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - Xun Lan
- Department of Basic Medical Science, School of Medicine, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Joint Center for Life Sciences, Tsinghua University, Beijing 100084, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - Fei Gu
- Damo Academy, Alibaba Group, Hangzhou 310023, China
- Hupan Lab, Hangzhou 310023, China
| |
Collapse
|
15
|
He B, Kram V, Furusawa T, Duverger O, Chu E, Nanduri R, Ishikawa M, Zhang P, Amendt B, Lee J, Bustin M. Epigenetic Regulation of Ameloblast Differentiation by HMGN Proteins. J Dent Res 2024; 103:51-61. [PMID: 37950483 PMCID: PMC10850876 DOI: 10.1177/00220345231202468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023] Open
Abstract
Dental enamel formation is coordinated by ameloblast differentiation, production of enamel matrix proteins, and crystal growth. The factors regulating ameloblast differentiation are not fully understood. Here we show that the high mobility group N (HMGN) nucleosomal binding proteins modulate the rate of ameloblast differentiation and enamel formation. We found that HMGN1 and HMGN2 proteins are downregulated during mouse ameloblast differentiation. Genetically altered mice lacking HMGN1 and HMGN2 proteins show faster ameloblast differentiation and a higher rate of enamel deposition in mice molars and incisors. In vitro differentiation of induced pluripotent stem cells to dental epithelium cells showed that HMGN proteins modulate the expression and chromatin accessibility of ameloblast-specific genes and affect the binding of transcription factors epiprofin and PITX2 to ameloblast-specific genes. Our results suggest that HMGN proteins regulate ameloblast differentiation and enamel mineralization by modulating lineage-specific chromatin accessibility and transcription factor binding to ameloblast regulatory sites.
Collapse
Affiliation(s)
- B. He
- Protein Section, Laboratory of Metabolism, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Craniofacial Anomalies and Regeneration Section, National Institute of Dental and Craniofacial Research, Bethesda, MD, USA
| | - V. Kram
- Molecular Biology of Bones & Teeth Section, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - T. Furusawa
- Protein Section, Laboratory of Metabolism, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - O. Duverger
- Craniofacial Anomalies and Regeneration Section, National Institute of Dental and Craniofacial Research, Bethesda, MD, USA
| | - E.Y. Chu
- Department of General Dentistry, Operative Division, University of Maryland, School of Dentistry, Baltimore, MD, USA
| | - R. Nanduri
- Protein Section, Laboratory of Metabolism, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - M. Ishikawa
- Department of Pathology and Laboratory Medicine and Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - P. Zhang
- Molecular Biology Section, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - B.A. Amendt
- Department of Anatomy and Cell Biology, and the Craniofacial Anomalies Research Center, Carver College of Medicine, the University of Iowa, Iowa City, IA, USA
| | - J.S. Lee
- Craniofacial Anomalies and Regeneration Section, National Institute of Dental and Craniofacial Research, Bethesda, MD, USA
| | - M. Bustin
- Protein Section, Laboratory of Metabolism, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
16
|
Xu C, Kleinschmidt H, Yang J, Leith E, Johnson J, Tan S, Mahony S, Bai L. Systematic Dissection of Sequence Features Affecting the Binding Specificity of a Pioneer Factor Reveals Binding Synergy Between FOXA1 and AP-1. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.08.566246. [PMID: 37986839 PMCID: PMC10659273 DOI: 10.1101/2023.11.08.566246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Despite the unique ability of pioneer transcription factors (PFs) to target nucleosomal sites in closed chromatin, they only bind a small fraction of their genomic motifs. The underlying mechanism of this selectivity is not well understood. Here, we design a high-throughput assay called ChIP-ISO to systematically dissect sequence features affecting the binding specificity of a classic PF, FOXA1. Combining ChIP-ISO with in vitro and neural network analyses, we find that 1) FOXA1 binding is strongly affected by co-binding TFs AP-1 and CEBPB, 2) FOXA1 and AP-1 show binding cooperativity in vitro, 3) FOXA1's binding is determined more by local sequences than chromatin context, including eu-/heterochromatin, and 4) AP-1 is partially responsible for differential binding of FOXA1 in different cell types. Our study presents a framework for elucidating genetic rules underlying PF binding specificity and reveals a mechanism for context-specific regulation of its binding.
Collapse
Affiliation(s)
- Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Jianyu Yang
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Erik Leith
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Jenna Johnson
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Song Tan
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Physics, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
17
|
Zhu I, Landsman D. Clustered and diverse transcription factor binding underlies cell type specificity of enhancers for housekeeping genes. Genome Res 2023; 33:1662-1672. [PMID: 37884340 PMCID: PMC10691539 DOI: 10.1101/gr.278130.123] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 09/12/2023] [Indexed: 10/28/2023]
Abstract
Housekeeping genes are considered to be regulated by common enhancers across different tissues. Here we report that most of the commonly expressed mouse or human genes across different cell types, including more than half of the previously identified housekeeping genes, are associated with cell type-specific enhancers. Furthermore, the binding of most transcription factors (TFs) is cell type-specific. We reason that these cell type specificities are causally related to the collective TF recruitment at regulatory sites, as TFs tend to bind to regions associated with many other TFs and each cell type has a unique repertoire of expressed TFs. Based on binding profiles of hundreds of TFs from HepG2, K562, and GM12878 cells, we show that 80% of all TF peaks overlapping H3K27ac signals are in the top 20,000-23,000 most TF-enriched H3K27ac peak regions, and approximately 12,000-15,000 of these peaks are enhancers (nonpromoters). Those enhancers are mainly cell type-specific and include those linked to the majority of commonly expressed genes. Moreover, we show that the top 15,000 most TF-enriched regulatory sites in HepG2 cells, associated with about 200 TFs, can be predicted largely from the binding profile of as few as 30 TFs. Through motif analysis, we show that major enhancers harbor diverse and clustered motifs from a combination of available TFs uniquely present in each cell type. We propose a mechanism that explains how the highly focused TF binding at regulatory sites results in cell type specificity of enhancers for housekeeping and commonly expressed genes.
Collapse
Affiliation(s)
- Iris Zhu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - David Landsman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
18
|
Tognon M, Giugno R, Pinello L. A survey on algorithms to characterize transcription factor binding sites. Brief Bioinform 2023; 24:bbad156. [PMID: 37099664 PMCID: PMC10422928 DOI: 10.1093/bib/bbad156] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 03/27/2023] [Accepted: 04/01/2023] [Indexed: 04/28/2023] Open
Abstract
Transcription factors (TFs) are key regulatory proteins that control the transcriptional rate of cells by binding short DNA sequences called transcription factor binding sites (TFBS) or motifs. Identifying and characterizing TFBS is fundamental to understanding the regulatory mechanisms governing the transcriptional state of cells. During the last decades, several experimental methods have been developed to recover DNA sequences containing TFBS. In parallel, computational methods have been proposed to discover and identify TFBS motifs based on these DNA sequences. This is one of the most widely investigated problems in bioinformatics and is referred to as the motif discovery problem. In this manuscript, we review classical and novel experimental and computational methods developed to discover and characterize TFBS motifs in DNA sequences, highlighting their advantages and drawbacks. We also discuss open challenges and future perspectives that could fill the remaining gaps in the field.
Collapse
Affiliation(s)
- Manuel Tognon
- Computer Science Department, University of Verona, Verona, Italy
- Molecular Pathology Unit, Center for Computational and Integrative Biology and Center for Cancer Research, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Rosalba Giugno
- Computer Science Department, University of Verona, Verona, Italy
| | - Luca Pinello
- Molecular Pathology Unit, Center for Computational and Integrative Biology and Center for Cancer Research, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Department of Pathology, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
19
|
Marri D, Filipovic D, Kana O, Tischkau S, Bhattacharya S. Prediction of mammalian tissue-specific CLOCK-BMAL1 binding to E-box DNA motifs. Sci Rep 2023; 13:7742. [PMID: 37173345 PMCID: PMC10182026 DOI: 10.1038/s41598-023-34115-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 04/25/2023] [Indexed: 05/15/2023] Open
Abstract
The Brain and Muscle ARNTL-Like 1 protein (BMAL1) forms a heterodimer with either Circadian Locomotor Output Cycles Kaput (CLOCK) or Neuronal PAS domain protein 2 (NPAS2) to act as a master regulator of the mammalian circadian clock gene network. The dimer binds to E-box gene regulatory elements on DNA, activating downstream transcription of clock genes. Identification of transcription factor binding sites and genomic features that correlate to DNA binding by BMAL1 is a challenging problem, given that CLOCK-BMAL1 or NPAS2-BMAL1 bind to several distinct binding motifs (CANNTG) on DNA. Using three different types of tissue-specific machine learning models with features based on (1) DNA sequence, (2) DNA sequence plus DNA shape, and (3) DNA sequence and shape plus histone modifications, we developed an interpretable predictive model of genome-wide BMAL1 binding to E-box motifs and dissected the mechanisms underlying BMAL1-DNA binding. Our results indicated that histone modifications, the local shape of the DNA, and the flanking sequence of the E-box motif are sufficient predictive features for BMAL1-DNA binding. Our models also provide mechanistic insights into tissue specificity of DNA binding by BMAL1.
Collapse
Affiliation(s)
- Daniel Marri
- Department of Biomedical Engineering, Michigan State University, East Lansing, MI, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - David Filipovic
- Department of Biomedical Engineering, Michigan State University, East Lansing, MI, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Omar Kana
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA
- Department of Pharmacology and Toxicology, Michigan State University, East Lansing, MI, USA
- Institute for Integrative Toxicology, Michigan State University, East Lansing, MI, USA
| | - Shelley Tischkau
- Department of Pharmacology, Southern Illinois University School of Medicine, Springfield, IL, USA
| | - Sudin Bhattacharya
- Department of Biomedical Engineering, Michigan State University, East Lansing, MI, USA.
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA.
- Department of Pharmacology and Toxicology, Michigan State University, East Lansing, MI, USA.
- Institute for Integrative Toxicology, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
20
|
Zhang Q, Teng P, Wang S, He Y, Cui Z, Guo Z, Liu Y, Yuan C, Liu Q, Huang DS. Computational prediction and characterization of cell-type-specific and shared binding sites. Bioinformatics 2022; 39:6885447. [PMID: 36484687 PMCID: PMC9825777 DOI: 10.1093/bioinformatics/btac798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 11/24/2022] [Accepted: 12/08/2022] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Cell-type-specific gene expression is maintained in large part by transcription factors (TFs) selectively binding to distinct sets of sites in different cell types. Recent research works have provided evidence that such cell-type-specific binding is determined by TF's intrinsic sequence preferences, cooperative interactions with co-factors, cell-type-specific chromatin landscapes and 3D chromatin interactions. However, computational prediction and characterization of cell-type-specific and shared binding sites is rarely studied. RESULTS In this article, we propose two computational approaches for predicting and characterizing cell-type-specific and shared binding sites by integrating multiple types of features, in which one is based on XGBoost and another is based on convolutional neural network (CNN). To validate the performance of our proposed approaches, ChIP-seq datasets of 10 binding factors were collected from the GM12878 (lymphoblastoid) and K562 (erythroleukemic) human hematopoietic cell lines, each of which was further categorized into cell-type-specific (GM12878- and K562-specific) and shared binding sites. Then, multiple types of features for these binding sites were integrated to train the XGBoost- and CNN-based models. Experimental results show that our proposed approaches significantly outperform other competing methods on three classification tasks. Moreover, we identified independent feature contributions for cell-type-specific and shared sites through SHAP values and explored the ability of the CNN-based model to predict cell-type-specific and shared binding sites by excluding or including DNase signals. Furthermore, we investigated the generalization ability of our proposed approaches to different binding factors in the same cellular environment. AVAILABILITY AND IMPLEMENTATION The source code is available at: https://github.com/turningpoint1988/CSSBS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qinhu Zhang
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Pengrui Teng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Siguo Wang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Ying He
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Zhen Cui
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Zhenghao Guo
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Yixin Liu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Changan Yuan
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Science, Nanning 530007, China
| | - Qi Liu
- To whom correspondence should be addressed. or
| | | |
Collapse
|
21
|
CLIMB: High-dimensional association detection in large scale genomic data. Nat Commun 2022; 13:6874. [PMID: 36371401 PMCID: PMC9653391 DOI: 10.1038/s41467-022-34360-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Accepted: 10/21/2022] [Indexed: 11/14/2022] Open
Abstract
Joint analyses of genomic datasets obtained in multiple different conditions are essential for understanding the biological mechanism that drives tissue-specificity and cell differentiation, but they still remain computationally challenging. To address this we introduce CLIMB (Composite LIkelihood eMpirical Bayes), a statistical methodology that learns patterns of condition-specificity present in genomic data. CLIMB provides a generic framework facilitating a host of analyses, such as clustering genomic features sharing similar condition-specific patterns and identifying which of these features are involved in cell fate commitment. We apply CLIMB to three sets of hematopoietic data, which examine CTCF ChIP-seq measured in 17 different cell populations, RNA-seq measured across constituent cell populations in three committed lineages, and DNase-seq in 38 cell populations. Our results show that CLIMB improves upon existing alternatives in statistical precision, while capturing interpretable and biologically relevant clusters in the data.
Collapse
|
22
|
Kshirsagar M, Yuan H, Ferres JL, Leslie C. BindVAE: Dirichlet variational autoencoders for de novo motif discovery from accessible chromatin. Genome Biol 2022; 23:174. [PMID: 35971180 PMCID: PMC9380350 DOI: 10.1186/s13059-022-02723-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 06/28/2022] [Indexed: 11/10/2022] Open
Abstract
We present a novel unsupervised deep learning approach called BindVAE, based on Dirichlet variational autoencoders, for jointly decoding multiple TF binding signals from open chromatin regions. BindVAE can disentangle an input DNA sequence into distinct latent factors that encode cell-type specific in vivo binding signals for individual TFs, composite patterns for TFs involved in cooperative binding, and genomic context surrounding the binding sites. On the task of retrieving the motifs of expressed TFs in a given cell type, BindVAE is competitive with existing motif discovery approaches.
Collapse
Affiliation(s)
| | - Han Yuan
- Calico Life Sciences, South San Francisco, CA, USA
| | | | | |
Collapse
|
23
|
Lawler AJ, Ramamurthy E, Brown AR, Shin N, Kim Y, Toong N, Kaplow IM, Wirthlin M, Zhang X, Phan BN, Fox GA, Wade K, He J, Ozturk BE, Byrne LC, Stauffer WR, Fish KN, Pfenning AR. Machine learning sequence prioritization for cell type-specific enhancer design. eLife 2022; 11:e69571. [PMID: 35576146 PMCID: PMC9110026 DOI: 10.7554/elife.69571] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 04/25/2022] [Indexed: 11/22/2022] Open
Abstract
Recent discoveries of extreme cellular diversity in the brain warrant rapid development of technologies to access specific cell populations within heterogeneous tissue. Available approaches for engineering-targeted technologies for new neuron subtypes are low yield, involving intensive transgenic strain or virus screening. Here, we present Specific Nuclear-Anchored Independent Labeling (SNAIL), an improved virus-based strategy for cell labeling and nuclear isolation from heterogeneous tissue. SNAIL works by leveraging machine learning and other computational approaches to identify DNA sequence features that confer cell type-specific gene activation and then make a probe that drives an affinity purification-compatible reporter gene. As a proof of concept, we designed and validated two novel SNAIL probes that target parvalbumin-expressing (PV+) neurons. Nuclear isolation using SNAIL in wild-type mice is sufficient to capture characteristic open chromatin features of PV+ neurons in the cortex, striatum, and external globus pallidus. The SNAIL framework also has high utility for multispecies cell probe engineering; expression from a mouse PV+ SNAIL enhancer sequence was enriched in PV+ neurons of the macaque cortex. Expansion of this technology has broad applications in cell type-specific observation, manipulation, and therapeutics across species and disease models.
Collapse
Affiliation(s)
- Alyssa J Lawler
- Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityPittsburghUnited States
- Biological Sciences Department, Mellon College of Science, Carnegie Mellon UniversityPittsburghUnited States
- Neuroscience Institute, Carnegie Mellon UniversityPittsburghUnited States
| | - Easwaran Ramamurthy
- Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityPittsburghUnited States
- Neuroscience Institute, Carnegie Mellon UniversityPittsburghUnited States
| | - Ashley R Brown
- Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityPittsburghUnited States
- Neuroscience Institute, Carnegie Mellon UniversityPittsburghUnited States
| | - Naomi Shin
- Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityPittsburghUnited States
- Neuroscience Institute, Carnegie Mellon UniversityPittsburghUnited States
| | - Yeonju Kim
- Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityPittsburghUnited States
- Neuroscience Institute, Carnegie Mellon UniversityPittsburghUnited States
| | - Noelle Toong
- Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityPittsburghUnited States
- Neuroscience Institute, Carnegie Mellon UniversityPittsburghUnited States
| | - Irene M Kaplow
- Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityPittsburghUnited States
- Neuroscience Institute, Carnegie Mellon UniversityPittsburghUnited States
| | - Morgan Wirthlin
- Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityPittsburghUnited States
- Neuroscience Institute, Carnegie Mellon UniversityPittsburghUnited States
| | - Xiaoyu Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityPittsburghUnited States
- Neuroscience Institute, Carnegie Mellon UniversityPittsburghUnited States
| | - BaDoi N Phan
- Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityPittsburghUnited States
- Neuroscience Institute, Carnegie Mellon UniversityPittsburghUnited States
- Medical Scientist Training Program, University of PittsburghPittsburghUnited States
| | - Grant A Fox
- Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityPittsburghUnited States
- Neuroscience Institute, Carnegie Mellon UniversityPittsburghUnited States
| | - Kirsten Wade
- Department of Psychiatry, Translational Neuroscience Program, University of PittsburghPittsburghUnited States
| | - Jing He
- Department of Neurobiology, University of PittsburghPittsburghUnited States
- Systems Neuroscience Center, Brain Institute, Center for Neuroscience, Center for the Neural Basis of CognitionPittsburghUnited States
| | - Bilge Esin Ozturk
- Department of Ophthalmology, University of PittsburghPittsburghUnited States
| | - Leah C Byrne
- Department of Neurobiology, University of PittsburghPittsburghUnited States
- Department of Ophthalmology, University of PittsburghPittsburghUnited States
- Division of Experimental Retinal Therapies, Department of Clinical Sciences & Advanced Medicine, School of Veterinary Medicine, University of PennsylvaniaPhiladelphiaUnited States
- Department of Bioengineering, University of PittsburghPittsburghUnited States
| | - William R Stauffer
- Department of Neurobiology, University of PittsburghPittsburghUnited States
| | - Kenneth N Fish
- Department of Psychiatry, Translational Neuroscience Program, University of PittsburghPittsburghUnited States
| | - Andreas R Pfenning
- Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityPittsburghUnited States
- Neuroscience Institute, Carnegie Mellon UniversityPittsburghUnited States
| |
Collapse
|
24
|
Kaplow IM, Banerjee A, Foo CS. Neural network modeling of differential binding between wild-type and mutant CTCF reveals putative binding preferences for zinc fingers 1-2. BMC Genomics 2022; 23:295. [PMID: 35410161 PMCID: PMC9004084 DOI: 10.1186/s12864-022-08486-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 03/21/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many transcription factors (TFs), such as multi zinc-finger (ZF) TFs, have multiple DNA binding domains (DBDs), and deciphering the DNA binding motifs of individual DBDs is a major challenge. One example of such a TF is CCCTC-binding factor (CTCF), a TF with eleven ZFs that plays a variety of roles in transcriptional regulation, most notably anchoring DNA loops. Previous studies found that CTCF ZFs 3-7 bind CTCF's core motif and ZFs 9-11 bind a specific upstream motif, but the motifs of ZFs 1-2 have yet to be identified. RESULTS We developed a new approach to identifying the binding motifs of individual DBDs of a TF through analyzing chromatin immunoprecipitation sequencing (ChIP-seq) experiments in which a single DBD is mutated: we train a deep convolutional neural network to predict whether wild-type TF binding sites are preserved in the mutant TF dataset and interpret the model. We applied this approach to mouse CTCF ChIP-seq data and identified the known binding preferences of CTCF ZFs 3-11 as well as a putative GAG binding motif for ZF 1. We analyzed other CTCF datasets to provide additional evidence that ZF 1 is associated with binding at the motif we identified, and we found that the presence of the motif for ZF 1 is associated with CTCF ChIP-seq peak strength. CONCLUSIONS Our approach can be applied to any TF for which in vivo binding data from both the wild-type and mutated versions of the TF are available, and our findings provide new potential insights binding preferences of CTCF's DBDs.
Collapse
Affiliation(s)
- Irene M Kaplow
- Departments of Computer Science, Stanford University, 240 Pasteur Drive, Stanford, California, 94305, USA. .,Present address: Department of Computational Biology, Carnegie Mellon University, 5000 Forbes Avenue, Gates-Hillman Building Room 7703, Pittsburgh, PA, 15213, USA.
| | - Abhimanyu Banerjee
- Departments of Physics, Stanford University, 240 Pasteur Drive, Stanford, California, 94305, USA
| | - Chuan Sheng Foo
- Departments of Computer Science, Stanford University, 240 Pasteur Drive, Stanford, California, 94305, USA. .,Present address: Machine Intellection Department, Institute for Infocomm Research, 1 Fusionopolis Way, #21-01 Connexis South Tower, Singapore, 138632, Singapore.
| |
Collapse
|
25
|
Kaplow IM, Schäffer DE, Wirthlin ME, Lawler AJ, Brown AR, Kleyman M, Pfenning AR. Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin. BMC Genomics 2022; 23:291. [PMID: 35410163 PMCID: PMC8996547 DOI: 10.1186/s12864-022-08450-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 03/07/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Evolutionary conservation is an invaluable tool for inferring functional significance in the genome, including regions that are crucial across many species and those that have undergone convergent evolution. Computational methods to test for sequence conservation are dominated by algorithms that examine the ability of one or more nucleotides to align across large evolutionary distances. While these nucleotide alignment-based approaches have proven powerful for protein-coding genes and some non-coding elements, they fail to capture conservation of many enhancers, distal regulatory elements that control spatial and temporal patterns of gene expression. The function of enhancers is governed by a complex, often tissue- and cell type-specific code that links combinations of transcription factor binding sites and other regulation-related sequence patterns to regulatory activity. Thus, function of orthologous enhancer regions can be conserved across large evolutionary distances, even when nucleotide turnover is high. RESULTS We present a new machine learning-based approach for evaluating enhancer conservation that leverages the combinatorial sequence code of enhancer activity rather than relying on the alignment of individual nucleotides. We first train a convolutional neural network model that can predict tissue-specific open chromatin, a proxy for enhancer activity, across mammals. Next, we apply that model to distinguish instances where the genome sequence would predict conserved function versus a loss of regulatory activity in that tissue. We present criteria for systematically evaluating model performance for this task and use them to demonstrate that our models accurately predict tissue-specific conservation and divergence in open chromatin between primate and rodent species, vastly out-performing leading nucleotide alignment-based approaches. We then apply our models to predict open chromatin at orthologs of brain and liver open chromatin regions across hundreds of mammals and find that brain enhancers associated with neuron activity have a stronger tendency than the general population to have predicted lineage-specific open chromatin. CONCLUSION The framework presented here provides a mechanism to annotate tissue-specific regulatory function across hundreds of genomes and to study enhancer evolution using predicted regulatory differences rather than nucleotide-level conservation measurements.
Collapse
Affiliation(s)
- Irene M Kaplow
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Daniel E Schäffer
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Morgan E Wirthlin
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alyssa J Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ashley R Brown
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Michael Kleyman
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Andreas R Pfenning
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA.
- Department of Biology, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
26
|
Zhou W, Hongkai J. Genome-wide Prediction of Chromatin Accessibility Based on Gene Expression. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2021; 13:e1544. [PMID: 39391743 PMCID: PMC11466374 DOI: 10.1002/wics.1544] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 11/28/2020] [Indexed: 10/12/2024]
Abstract
Decoding gene regulation in a biological system requires information from both transcriptome and regulome. While multiple high-throughput transcriptome and regulome mapping technologies are available, transcriptome profiling is more widely used. Today, over a million bulk and single-cell gene expression samples are stored in public databases. This number is orders of magnitude larger than the number of available regulome samples. Most of the gene expression samples do not have corresponding regulome data. However, it is possible to obtain regulome information via prediction. Open chromatin is a hallmark of active regulatory elements. This mini-review discusses recent advances in predicting chromatin accessibility using gene expression data, including both the development of prediction methods and their applications in expanding the regulome catalog, improving regulome analysis, integrating transcriptome and regulome data, and facilitating single-cell analysis of gene regulation.
Collapse
Affiliation(s)
- Weiqiang Zhou
- Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore, MD 21205, USA
| | - Ji Hongkai
- Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore, MD 21205, USA
| |
Collapse
|
27
|
Zhang Q, Wang D, Han K, Huang DS. Predicting TF-DNA Binding Motifs from ChIP-seq Datasets Using the Bag-Based Classifier Combined With a Multi-Fold Learning Scheme. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1743-1751. [PMID: 32946398 DOI: 10.1109/tcbb.2020.3025007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The rapid development of high-throughput sequencing technology provides unique opportunities for studying of transcription factor binding sites, but also brings new computational challenges. Recently, a series of discriminative motif discovery (DMD) methods have been proposed and offer promising solutions for addressing these challenges. However, because of the huge computation cost, most of them have to choose approximate schemes that either sacrifice the accuracy of motif representation or tune motif parameter indirectly. In this paper, we propose a bag-based classifier combined with a multi-fold learning scheme (BCMF) to discover motifs from ChIP-seq datasets. First, BCMF formulates input sequences as a labeled bag naturally. Then, a bag-based classifier, combining with a bag feature extracting strategy, is applied to construct the objective function, and a multi-fold learning scheme is used to solve it. Compared with the existing DMD tools, BCMF features three improvements: 1) Learning position weight matrix (PWM) directly in a continuous space; 2) Proposing to represent a positive bag with a feature fused by its k "most positive" patterns. 3) Applying a more advanced learning scheme. The experimental results on 134 ChIP-seq datasets show that BCMF substantially outperforms existing DMD methods (including DREME, HOMER, XXmotif, motifRG, EDCOD and our previous work).
Collapse
|
28
|
Chumpitaz-Diaz L, Samee MAH, Pollard KS. Systematic identification of non-canonical transcription factor motifs. BMC Mol Cell Biol 2021; 22:44. [PMID: 34465294 PMCID: PMC8408965 DOI: 10.1186/s12860-021-00382-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 07/21/2021] [Indexed: 11/20/2022] Open
Abstract
Sequence-specific transcription factors (TFs) recognize motifs of related nucleotide sequences at their DNA binding sites. Upon binding at these sites, TFs regulate critical molecular processes such as gene expression. It is widely assumed that a TF recognizes a single “canonical” motif, although recent studies have identified additional “non-canonical” motifs for some TFs. A comprehensive approach to identify non-canonical DNA binding motifs and the functional importance of those motifs’ matches in the human genome is necessary for fully understanding the mechanisms of TF-regulated molecular processes in human cells. To address this need, we developed a statistical pipeline for in vitro HT-SELEX data that identifies and characterizes the distributions of non-canonical TF motifs in a stringent manner. Analyzing ~170 human TFs’ HT-SELEX data, we found non-canonical motifs for 19 TFs (11%). These non-canonical motifs occur independently of the TFs’ canonical motifs. Non-canonical motif occurrences in the human genome show similar evolutionary conservation to canonical motif occurrences, explain TF binding in locations without canonical motifs, and occur within gene promoters and epigenetically marked regulatory sequences in human cell lines and tissues. Our approach and collection of non-canonical motifs expand current understanding of functionally relevant DNA binding sites for human TFs.
Collapse
Affiliation(s)
| | - Md Abul Hassan Samee
- Department of Molecular Physiology and Biophysics, Baylor College of Medicine, Houston,, TX, USA.
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA. .,Department of Epidemiology & Biostatistics, Institute for Human Genetics, Quantitative Biology Institute, and Institute for Computational Health Sciences, University of California, San Francisco, CA, USA. .,Chan-Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
29
|
Schreiber J, Singh R. Machine learning for profile prediction in genomics. Curr Opin Chem Biol 2021; 65:35-41. [PMID: 34107341 DOI: 10.1016/j.cbpa.2021.04.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 04/21/2021] [Accepted: 04/24/2021] [Indexed: 02/08/2023]
Abstract
A recent deluge of publicly available multi-omics data has fueled the development of machine learning methods aimed at investigating important questions in genomics. Although the motivations for these methods vary, a task that is commonly adopted is that of profile prediction, where predictions are made for one or more forms of biochemical activity along the genome, for example, histone modification, chromatin accessibility, or protein binding. In this review, we give an overview of the research works performing profile prediction, define two broad categories of profile prediction tasks, and discuss the types of scientific questions that can be answered in each.
Collapse
Affiliation(s)
| | - Ritambhara Singh
- Department of Computer Science, Center for Computational Molecular Biology, Brown University, United States.
| |
Collapse
|
30
|
Menzel M, Hurka S, Glasenhardt S, Gogol-Döring A. NoPeak: k-mer-based motif discovery in ChIP-Seq data without peak calling. Bioinformatics 2021; 37:596-602. [PMID: 32991679 DOI: 10.1093/bioinformatics/btaa845] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 09/14/2020] [Indexed: 01/30/2023] Open
Abstract
MOTIVATION The discovery of sequence motifs mediating DNA-protein binding usually implies the determination of binding sites using high-throughput sequencing and peak calling. The determination of peaks, however, depends strongly on data quality and is susceptible to noise. RESULTS Here, we present a novel approach to reliably identify transcription factor-binding motifs from ChIP-Seq data without peak detection. By evaluating the distributions of sequencing reads around the different k-mers in the genome, we are able to identify binding motifs in ChIP-Seq data that yield no results in traditional pipelines. AVAILABILITY AND IMPLEMENTATION NoPeak is published under the GNU General Public License and available as a standalone console-based Java application at https://github.com/menzel/nopeak. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michael Menzel
- MNI, Technische Hochschule Mittelhessen, University of Applied Sciences, Giessen 35390, Germany
| | - Sabine Hurka
- Institute for Insect Biotechnology, Justus Liebig University, Giessen 35392, Germany
| | - Stefan Glasenhardt
- MNI, Technische Hochschule Mittelhessen, University of Applied Sciences, Giessen 35390, Germany
| | - Andreas Gogol-Döring
- MNI, Technische Hochschule Mittelhessen, University of Applied Sciences, Giessen 35390, Germany
| |
Collapse
|
31
|
Lee SA, Lee KH, Kim H, Cho JY. METTL8 mRNA Methyltransferase Enhances Cancer Cell Migration via Direct Binding to ARID1A. Int J Mol Sci 2021; 22:ijms22115432. [PMID: 34063990 PMCID: PMC8196784 DOI: 10.3390/ijms22115432] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 05/16/2021] [Accepted: 05/19/2021] [Indexed: 12/24/2022] Open
Abstract
The association of RNA modification in cancer has recently been highlighted. Methyltransferase like 8 (METTL8) is an enzyme and its role in mRNA m3C modification has barely been studied. In this study, we found that METTL8 expression was significantly up-regulated in canine mammary tumor and investigated its functional roles in the tumor process, including cancer cell proliferation and migration. METTL8 expression was up-regulated in most human breast cancer cell lines tested and decreased by Yin Yang 1 (YY1) transcription factor knockdown, suggesting that YY1 is a regulating transcription factor. The knockdown of METTL8 attenuated tumor cell growth and strongly blocked tumor cell migration. AT-rich interactive domain-containing protein 1A (ARID1A) was identified as a candidate mRNA by METTL8. ARID1A mRNA binds to METTL8 protein. ARID1A mRNA expression was not changed by METTL8 knockdown, but ARID1A protein level was significantly increased. Collectively, our study indicates that METTL8 up-regulated by YY1 in breast cancer plays an important role in cancer cell migration through the mRNA modification of ARID1A, resulting in the attenuation of its translation.
Collapse
Affiliation(s)
| | | | | | - Je-Yoel Cho
- Correspondence: ; Tel.: +82-02-880-1268; Fax: +82-02-886-1268
| |
Collapse
|
32
|
Palmateer CM, Moseley SC, Ray S, Brovero SG, Arbeitman MN. Analysis of cell-type-specific chromatin modifications and gene expression in Drosophila neurons that direct reproductive behavior. PLoS Genet 2021; 17:e1009240. [PMID: 33901168 PMCID: PMC8102012 DOI: 10.1371/journal.pgen.1009240] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 05/06/2021] [Accepted: 04/05/2021] [Indexed: 02/06/2023] Open
Abstract
Examining the role of chromatin modifications and gene expression in neurons is critical for understanding how the potential for behaviors are established and maintained. We investigate this question by examining Drosophila melanogaster fru P1 neurons that underlie reproductive behaviors in both sexes. We developed a method to purify cell-type-specific chromatin (Chromatag), using a tagged histone H2B variant that is expressed using the versatile Gal4/UAS gene expression system. Here, we use Chromatag to evaluate five chromatin modifications, at three life stages in both sexes. We find substantial changes in chromatin modification profiles across development and fewer differences between males and females. Additionally, we find chromatin modifications that persist in different sets of genes from pupal to adult stages, which may point to genes important for cell fate determination in fru P1 neurons. We generated cell-type-specific RNA-seq data sets, using translating ribosome affinity purification (TRAP). We identify actively translated genes in fru P1 neurons, revealing novel stage- and sex-differences in gene expression. We also find chromatin modification enrichment patterns that are associated with gene expression. Next, we use the chromatin modification data to identify cell-type-specific super-enhancer-containing genes. We show that genes with super-enhancers in fru P1 neurons differ across development and between the sexes. We validated that a set of genes are expressed in fru P1 neurons, which were chosen based on having a super-enhancer and TRAP-enriched expression in fru P1 neurons.
Collapse
Affiliation(s)
- Colleen M. Palmateer
- Department of Biomedical Sciences, Florida State University, College of Medicine, Tallahassee, Florida, United States of America
| | - Shawn C. Moseley
- Department of Biomedical Sciences, Florida State University, College of Medicine, Tallahassee, Florida, United States of America
| | - Surjyendu Ray
- Department of Biomedical Sciences, Florida State University, College of Medicine, Tallahassee, Florida, United States of America
| | - Savannah G. Brovero
- Department of Biomedical Sciences, Florida State University, College of Medicine, Tallahassee, Florida, United States of America
| | - Michelle N. Arbeitman
- Department of Biomedical Sciences, Florida State University, College of Medicine, Tallahassee, Florida, United States of America
- Program of Neuroscience, Florida State University, Tallahassee, Florida, United States of America
- * E-mail:
| |
Collapse
|
33
|
E2A-regulated epigenetic landscape promotes memory CD8 T cell differentiation. Proc Natl Acad Sci U S A 2021; 118:2013452118. [PMID: 33859041 DOI: 10.1073/pnas.2013452118] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
During an acute viral infection, CD8 T cells encounter a myriad of antigenic and inflammatory signals of variable strength, which sets off individual T cells on their own differentiation trajectories. However, the developmental path for each of these cells will ultimately lead to one of only two potential outcomes after clearance of the infection-death or survival and development into memory CD8 T cells. How this cell fate decision is made remains incompletely understood. In this study, we explore the transcriptional changes during effector and memory CD8 T cell differentiation at the single-cell level. Using single-cell, transcriptome-derived gene regulatory network analysis, we identified two main groups of regulons that govern this differentiation process. These regulons function in concert with changes in the enhancer landscape to confer the establishment of the regulatory modules underlying the cell fate decision of CD8 T cells. Furthermore, we found that memory precursor effector cells maintain chromatin accessibility at enhancers for key memory-related genes and that these enhancers are highly enriched for E2A binding sites. Finally, we show that E2A directly regulates accessibility of enhancers of many memory-related genes and that its overexpression increases the frequency of memory precursor effector cells and accelerates memory cell formation while decreasing the frequency of short-lived effector cells. Overall, our results suggest that effector and memory CD8 T cell differentiation is largely regulated by two transcriptional circuits, with E2A serving as an important epigenetic regulator of the memory circuit.
Collapse
|
34
|
Hackett SR, Baltz EA, Coram M, Wranik BJ, Kim G, Baker A, Fan M, Hendrickson DG, Berndl M, McIsaac RS. Learning causal networks using inducible transcription factors and transcriptome-wide time series. Mol Syst Biol 2021; 16:e9174. [PMID: 32181581 PMCID: PMC7076914 DOI: 10.15252/msb.20199174] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 02/13/2020] [Accepted: 02/19/2020] [Indexed: 11/27/2022] Open
Abstract
We present IDEA (the Induction Dynamics gene Expression Atlas), a dataset constructed by independently inducing hundreds of transcription factors (TFs) and measuring timecourses of the resulting gene expression responses in budding yeast. Each experiment captures a regulatory cascade connecting a single induced regulator to the genes it causally regulates. We discuss the regulatory cascade of a single TF, Aft1, in detail; however, IDEA contains > 200 TF induction experiments with 20 million individual observations and 100,000 signal‐containing dynamic responses. As an application of IDEA, we integrate all timecourses into a whole‐cell transcriptional model, which is used to predict and validate multiple new and underappreciated transcriptional regulators. We also find that the magnitudes of coefficients in this model are predictive of genetic interaction profile similarities. In addition to being a resource for exploring regulatory connectivity between TFs and their target genes, our modeling approach shows that combining rapid perturbations of individual genes with genome‐scale time‐series measurements is an effective strategy for elucidating gene regulatory networks.
Collapse
Affiliation(s)
| | | | | | | | - Griffin Kim
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Adam Baker
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | | | | | | |
Collapse
|
35
|
Srivastava D, Aydin B, Mazzoni EO, Mahony S. An interpretable bimodal neural network characterizes the sequence and preexisting chromatin predictors of induced transcription factor binding. Genome Biol 2021; 22:20. [PMID: 33413545 PMCID: PMC7788824 DOI: 10.1186/s13059-020-02218-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 12/03/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Transcription factor (TF) binding specificity is determined via a complex interplay between the transcription factor's DNA binding preference and cell type-specific chromatin environments. The chromatin features that correlate with transcription factor binding in a given cell type have been well characterized. For instance, the binding sites for a majority of transcription factors display concurrent chromatin accessibility. However, concurrent chromatin features reflect the binding activities of the transcription factor itself and thus provide limited insight into how genome-wide TF-DNA binding patterns became established in the first place. To understand the determinants of transcription factor binding specificity, we therefore need to examine how newly activated transcription factors interact with sequence and preexisting chromatin landscapes. RESULTS Here, we investigate the sequence and preexisting chromatin predictors of TF-DNA binding by examining the genome-wide occupancy of transcription factors that have been induced in well-characterized chromatin environments. We develop Bichrom, a bimodal neural network that jointly models sequence and preexisting chromatin data to interpret the genome-wide binding patterns of induced transcription factors. We find that the preexisting chromatin landscape is a differential global predictor of TF-DNA binding; incorporating preexisting chromatin features improves our ability to explain the binding specificity of some transcription factors substantially, but not others. Furthermore, by analyzing site-level predictors, we show that transcription factor binding in previously inaccessible chromatin tends to correspond to the presence of more favorable cognate DNA sequences. CONCLUSIONS Bichrom thus provides a framework for modeling, interpreting, and visualizing the joint sequence and chromatin landscapes that determine TF-DNA binding dynamics.
Collapse
Affiliation(s)
- Divyanshi Srivastava
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Begüm Aydin
- Department of Biology, New York University, New York, NY, USA
| | | | - Shaun Mahony
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
36
|
Wu L, Han L, Li Q, Wang G, Zhang H, Li L. Using Interactome Big Data to Crack Genetic Mysteries and Enhance Future Crop Breeding. MOLECULAR PLANT 2021; 14:77-94. [PMID: 33340690 DOI: 10.1016/j.molp.2020.12.012] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 12/11/2020] [Accepted: 12/14/2020] [Indexed: 05/27/2023]
Abstract
The functional genes underlying phenotypic variation and their interactions represent "genetic mysteries". Understanding and utilizing these genetic mysteries are key solutions for mitigating the current threats to agriculture posed by population growth and individual food preferences. Due to advances in high-throughput multi-omics technologies, we are stepping into an Interactome Big Data era that is certain to revolutionize genetic research. In this article, we provide a brief overview of current strategies to explore genetic mysteries. We then introduce the methods for constructing and analyzing the Interactome Big Data and summarize currently available interactome resources. Next, we discuss how Interactome Big Data can be used as a versatile tool to dissect genetic mysteries. We propose an integrated strategy that could revolutionize genetic research by combining Interactome Big Data with machine learning, which involves mining information hidden in Big Data to identify the genetic models or networks that control various traits, and also provide a detailed procedure for systematic dissection of genetic mysteries,. Finally, we discuss three promising future breeding strategies utilizing the Interactome Big Data to improve crop yields and quality.
Collapse
Affiliation(s)
- Leiming Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Linqian Han
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Qing Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Guoying Wang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Hongwei Zhang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
37
|
Identification of Cis-Regulatory Sequences Controlling Pollen-Specific Expression of Hydroxyproline-Rich Glycoprotein Genes in Arabidopsis thaliana. PLANTS 2020; 9:plants9121751. [PMID: 33322028 PMCID: PMC7763877 DOI: 10.3390/plants9121751] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Revised: 11/26/2020] [Accepted: 12/07/2020] [Indexed: 02/06/2023]
Abstract
Hydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall structural proteins that function in various aspects of plant growth and development, including pollen tube growth. We have previously characterized protein sequence signatures for three family members in the HRGP superfamily: the hyperglycosylated arabinogalactan-proteins (AGPs), the moderately glycosylated extensins (EXTs), and the lightly glycosylated proline-rich proteins (PRPs). However, the mechanism of pollen-specific HRGP gene expression remains unexplored. To this end, we developed an integrative analysis pipeline combining RNA-seq gene expression and promoter sequences to identify cis-regulatory motifs responsible for pollen-specific expression of HRGP genes in Arabidopsis thaliana. Specifically, we mined the public RNA-seq datasets and identified 13 pollen-specific HRGP genes. Ensemble motif discovery identified 15 conserved promoter elements between A.thaliana and A. lyrata. Motif scanning revealed two pollen related transcription factors: GATA12 and brassinosteroid (BR) signaling pathway regulator BZR1. Finally, we performed a regression analysis and demonstrated that the 15 motifs provided a good model of HRGP gene expression in pollen (R = 0.61). In conclusion, we performed the first integrative analysis of cis-regulatory motifs in pollen-specific HRGP genes, revealing important insights into transcriptional regulation in pollen tissue.
Collapse
|
38
|
Krützfeldt LM, Schubach M, Kircher M. The impact of different negative training data on regulatory sequence predictions. PLoS One 2020; 15:e0237412. [PMID: 33259518 PMCID: PMC7707526 DOI: 10.1371/journal.pone.0237412] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 11/12/2020] [Indexed: 01/08/2023] Open
Abstract
Regulatory regions, like promoters and enhancers, cover an estimated 5–15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences. Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements' relative activity as measured from independent experimental data. Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization.
Collapse
Affiliation(s)
- Louisa-Marie Krützfeldt
- Charité–Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
| | - Max Schubach
- Charité–Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
| | - Martin Kircher
- Charité–Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
- * E-mail:
| |
Collapse
|
39
|
Chen L, Capra JA. Learning and interpreting the gene regulatory grammar in a deep learning framework. PLoS Comput Biol 2020; 16:e1008334. [PMID: 33137083 PMCID: PMC7660921 DOI: 10.1371/journal.pcbi.1008334] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 11/12/2020] [Accepted: 09/12/2020] [Indexed: 12/12/2022] Open
Abstract
Deep neural networks (DNNs) have achieved state-of-the-art performance in identifying gene regulatory sequences, but they have provided limited insight into the biology of regulatory elements due to the difficulty of interpreting the complex features they learn. Several models of how combinatorial binding of transcription factors, i.e. the regulatory grammar, drives enhancer activity have been proposed, ranging from the flexible TF billboard model to the stringent enhanceosome model. However, there is limited knowledge of the prevalence of these (or other) sequence architectures across enhancers. Here we perform several hypothesis-driven analyses to explore the ability of DNNs to learn the regulatory grammar of enhancers. We created synthetic datasets based on existing hypotheses about combinatorial transcription factor binding site (TFBS) patterns, including homotypic clusters, heterotypic clusters, and enhanceosomes, from real TF binding motifs from diverse TF families. We then trained deep residual neural networks (ResNets) to model the sequences under a range of scenarios that reflect real-world multi-label regulatory sequence prediction tasks. We developed a gradient-based unsupervised clustering method to extract the patterns learned by the ResNet models. We demonstrated that simulated regulatory grammars are best learned in the penultimate layer of the ResNets, and the proposed method can accurately retrieve the regulatory grammar even when there is heterogeneity in the enhancer categories and a large fraction of TFBS outside of the regulatory grammar. However, we also identify common scenarios where ResNets fail to learn simulated regulatory grammars. Finally, we applied the proposed method to mouse developmental enhancers and were able to identify the components of a known heterotypic TF cluster. Our results provide a framework for interpreting the regulatory rules learned by ResNets, and they demonstrate that the ability and efficiency of ResNets in learning the regulatory grammar depends on the nature of the prediction task.
Collapse
Affiliation(s)
- Ling Chen
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States of America
| | - John A. Capra
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States of America
- Vanderbilt Genetics Institute and Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States of America
| |
Collapse
|
40
|
Tobias IC, Abatti LE, Moorthy SD, Mullany S, Taylor T, Khader N, Filice MA, Mitchell JA. Transcriptional enhancers: from prediction to functional assessment on a genome-wide scale. Genome 2020; 64:426-448. [PMID: 32961076 DOI: 10.1139/gen-2020-0104] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Enhancers are cis-regulatory sequences located distally to target genes. These sequences consolidate developmental and environmental cues to coordinate gene expression in a tissue-specific manner. Enhancer function and tissue specificity depend on the expressed set of transcription factors, which recognize binding sites and recruit cofactors that regulate local chromatin organization and gene transcription. Unlike other genomic elements, enhancers are challenging to identify because they function independently of orientation, are often distant from their promoters, have poorly defined boundaries, and display no reading frame. In addition, there are no defined genetic or epigenetic features that are unambiguously associated with enhancer activity. Over recent years there have been developments in both empirical assays and computational methods for enhancer prediction. We review genome-wide tools, CRISPR advancements, and high-throughput screening approaches that have improved our ability to both observe and manipulate enhancers in vitro at the level of primary genetic sequences, chromatin states, and spatial interactions. We also highlight contemporary animal models and their importance to enhancer validation. Together, these experimental systems and techniques complement one another and broaden our understanding of enhancer function in development, evolution, and disease.
Collapse
Affiliation(s)
- Ian C Tobias
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Luis E Abatti
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Sakthi D Moorthy
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Shanelle Mullany
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Tiegh Taylor
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Nawrah Khader
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Mario A Filice
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Jennifer A Mitchell
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| |
Collapse
|
41
|
Abstract
Spatiotemporal control of gene expression during development requires orchestrated activities of numerous enhancers, which are cis-regulatory DNA sequences that, when bound by transcription factors, support selective activation or repression of associated genes. Proper activation of enhancers is critical during embryonic development, adult tissue homeostasis, and regeneration, and inappropriate enhancer activity is often associated with pathological conditions such as cancer. Multiple consortia [e.g., the Encyclopedia of DNA Elements (ENCODE) Consortium and National Institutes of Health Roadmap Epigenomics Mapping Consortium] and independent investigators have mapped putative regulatory regions in a large number of cell types and tissues, but the sequence determinants of cell-specific enhancers are not yet fully understood. Machine learning approaches trained on large sets of these regulatory regions can identify core transcription factor binding sites and generate quantitative predictions of enhancer activity and the impact of sequence variants on activity. Here, we review these computational methods in the context of enhancer prediction and gene regulatory network models specifying cell fate.
Collapse
Affiliation(s)
- Michael A Beer
- Department of Biomedical Engineering and McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland 21205, USA;
| | - Dustin Shigaki
- Department of Biomedical Engineering and McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland 21205, USA;
| | | |
Collapse
|
42
|
A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions. Genes (Basel) 2020; 11:genes11090985. [PMID: 32847102 PMCID: PMC7563616 DOI: 10.3390/genes11090985] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 08/18/2020] [Accepted: 08/20/2020] [Indexed: 02/07/2023] Open
Abstract
The role of three-dimensional genome organization as a critical regulator of gene expression has become increasingly clear over the last decade. Most of our understanding of this association comes from the study of long range chromatin interaction maps provided by Chromatin Conformation Capture-based techniques, which have greatly improved in recent years. Since these procedures are experimentally laborious and expensive, in silico prediction has emerged as an alternative strategy to generate virtual maps in cell types and conditions for which experimental data of chromatin interactions is not available. Several methods have been based on predictive models trained on one-dimensional (1D) sequencing features, yielding promising results. However, different approaches vary both in the way they model chromatin interactions and in the machine learning-based strategy they rely on, making it challenging to carry out performance comparison of existing methods. In this study, we use publicly available 1D sequencing signals to model cohesin-mediated chromatin interactions in two human cell lines and evaluate the prediction performance of six popular machine learning algorithms: decision trees, random forests, gradient boosting, support vector machines, multi-layer perceptron and deep learning. Our approach accurately predicts long-range interactions and reveals that gradient boosting significantly outperforms the other five methods, yielding accuracies of about 95%. We show that chromatin features in close genomic proximity to the anchors cover most of the predictive information, as has been previously reported. Moreover, we demonstrate that gradient boosting models trained with different subsets of chromatin features, unlike the other methods tested, are able to produce accurate predictions. In this regard, and besides architectural proteins, transcription factors are shown to be highly informative. Our study provides a framework for the systematic prediction of long-range chromatin interactions, identifies gradient boosting as the best suited algorithm for this task and highlights cell-type specific binding of transcription factors at the anchors as important determinants of chromatin wiring mediated by cohesin.
Collapse
|
43
|
Goulet DR, Foster JP, Zawistowski JS, Bevill SM, Noël MP, Olivares-Quintero JF, Sciaky N, Singh D, Santos C, Pattenden SG, Davis IJ, Johnson GL. Discrete Adaptive Responses to MEK Inhibitor in Subpopulations of Triple-Negative Breast Cancer. Mol Cancer Res 2020; 18:1685-1698. [PMID: 32753473 DOI: 10.1158/1541-7786.mcr-19-1011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 06/18/2020] [Accepted: 07/31/2020] [Indexed: 12/31/2022]
Abstract
Triple-negative breast cancers contain a spectrum of epithelial and mesenchymal phenotypes. SUM-229PE cells represent a model for this heterogeneity, maintaining both epithelial and mesenchymal subpopulations that are genomically similar but distinct in gene expression profiles. We identified differential regions of open chromatin in epithelial and mesenchymal cells that were strongly correlated with regions of H3K27ac. Motif analysis of these regions identified consensus sequences for transcription factors that regulate cell identity. Treatment with the MEK inhibitor trametinib induced enhancer remodeling that is associated with transcriptional regulation of genes in epithelial and mesenchymal cells. Motif analysis of enhancer peaks downregulated in response to chronic treatment with trametinib identified AP-1 motif enrichment in both epithelial and mesenchymal subpopulations. Chromatin immunoprecipitation sequencing (ChIP-seq) of JUNB identified subpopulation-specific localization, which was significantly enriched at regions of open chromatin. These results indicate that cell identity controls localization of transcription factors and chromatin-modifying enzymes to enhancers for differential control of gene expression. We identified increased H3K27ac at an enhancer region proximal to CXCR7, a G-protein-coupled receptor that increased 15-fold in expression in the epithelial subpopulation during chronic treatment. RNAi knockdown of CXCR7 inhibited proliferation in trametinib-resistant cells. Thus, adaptive resistance to chronic trametinib treatment contributes to proliferation in the presence of the drug. Acquired amplification of KRAS following trametinib dose escalation further contributed to POS cell proliferation. Adaptive followed by acquired gene expression changes contributed to proliferation in trametinib-resistant cells, suggesting inhibition of early transcriptional reprogramming could prevent resistance and the bypass of targeted therapy. IMPLICATIONS: We defined the differential responses to trametinib in subpopulations of a clinically relevant in vitro model of TNBC, and identified both adaptive and acquired elements that contribute to the emergence of drug resistance mediated by increased expression of CXCR7 and amplification of KRAS.
Collapse
Affiliation(s)
- Daniel R Goulet
- Department of Pharmacology, Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina
| | - Joseph P Foster
- Curriculum in Bioinformatics and Computational Biology, Department of Genetics, Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina
| | - Jon S Zawistowski
- Department of Pharmacology, Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina
| | - Samantha M Bevill
- Department of Pharmacology, Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina
| | - Mélodie P Noël
- Department of Genetics, Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina
| | - José F Olivares-Quintero
- Department of Pharmacology, Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina
| | - Noah Sciaky
- Department of Pharmacology, Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina
| | - Darshan Singh
- Department of Pharmacology, Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina
| | - Charlene Santos
- Department of Genetics, Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina
| | - Samantha G Pattenden
- Eshelman School of Pharmacy, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina
| | - Ian J Davis
- Curriculum in Bioinformatics and Computational Biology, Department of Genetics, Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina.,Department of Pediatrics, University of North Carolina School of Medicine, Chapel Hill, North Carolina
| | - Gary L Johnson
- Department of Pharmacology, Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, North Carolina.
| |
Collapse
|
44
|
Azodi CB, Lloyd JP, Shiu SH. The cis-regulatory codes of response to combined heat and drought stress in Arabidopsis thaliana. NAR Genom Bioinform 2020; 2:lqaa049. [PMID: 33575601 PMCID: PMC7671360 DOI: 10.1093/nargab/lqaa049] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 05/22/2020] [Accepted: 07/06/2020] [Indexed: 11/24/2022] Open
Abstract
Plants respond to their environment by dynamically modulating gene expression. A powerful approach for understanding how these responses are regulated is to integrate information about cis-regulatory elements (CREs) into models called cis-regulatory codes. Transcriptional response to combined stress is typically not the sum of the responses to the individual stresses. However, cis-regulatory codes underlying combined stress response have not been established. Here we modeled transcriptional response to single and combined heat and drought stress in Arabidopsis thaliana. We grouped genes by their pattern of response (independent, antagonistic and synergistic) and trained machine learning models to predict their response using putative CREs (pCREs) as features (median F-measure = 0.64). We then developed a deep learning approach to integrate additional omics information (sequence conservation, chromatin accessibility and histone modification) into our models, improving performance by 6.2%. While pCREs important for predicting independent and antagonistic responses tended to resemble binding motifs of transcription factors associated with heat and/or drought stress, important synergistic pCREs resembled binding motifs of transcription factors not known to be associated with stress. These findings demonstrate how in silico approaches can improve our understanding of the complex codes regulating response to combined stress and help us identify prime targets for future characterization.
Collapse
Affiliation(s)
- Christina B Azodi
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| | - John P Lloyd
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
45
|
Srivastava D, Mahony S. Sequence and chromatin determinants of transcription factor binding and the establishment of cell type-specific binding patterns. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2020; 1863:194443. [PMID: 31639474 PMCID: PMC7166147 DOI: 10.1016/j.bbagrm.2019.194443] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 09/21/2019] [Accepted: 10/06/2019] [Indexed: 12/14/2022]
Abstract
Transcription factors (TFs) selectively bind distinct sets of sites in different cell types. Such cell type-specific binding specificity is expected to result from interplay between the TF's intrinsic sequence preferences, cooperative interactions with other regulatory proteins, and cell type-specific chromatin landscapes. Cell type-specific TF binding events are highly correlated with patterns of chromatin accessibility and active histone modifications in the same cell type. However, since concurrent chromatin may itself be a consequence of TF binding, chromatin landscapes measured prior to TF activation provide more useful insights into how cell type-specific TF binding events became established in the first place. Here, we review the various sequence and chromatin determinants of cell type-specific TF binding specificity. We identify the current challenges and opportunities associated with computational approaches to characterizing, imputing, and predicting cell type-specific TF binding patterns. We further focus on studies that characterize TF binding in dynamic regulatory settings, and we discuss how these studies are leading to a more complex and nuanced understanding of dynamic protein-DNA binding activities. We propose that TF binding activities at individual sites can be viewed along a two-dimensional continuum of local sequence and chromatin context. Under this view, cell type-specific TF binding activities may result from either strongly favorable sequence features or strongly favorable chromatin context.
Collapse
Affiliation(s)
- Divyanshi Srivastava
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA, United States of America
| | - Shaun Mahony
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA, United States of America.
| |
Collapse
|
46
|
Beisaw A, Kuenne C, Guenther S, Dallmann J, Wu CC, Bentsen M, Looso M, Stainier DYR. AP-1 Contributes to Chromatin Accessibility to Promote Sarcomere Disassembly and Cardiomyocyte Protrusion During Zebrafish Heart Regeneration. Circ Res 2020; 126:1760-1778. [PMID: 32312172 DOI: 10.1161/circresaha.119.316167] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
RATIONALE The adult human heart is an organ with low regenerative potential. Heart failure following acute myocardial infarction is a leading cause of death due to the inability of cardiomyocytes to proliferate and replenish lost cardiac muscle. While the zebrafish has emerged as a powerful model to study endogenous cardiac regeneration, the molecular mechanisms by which cardiomyocytes respond to damage by disassembling sarcomeres, proliferating, and repopulating the injured area remain unclear. Furthermore, we are far from understanding the regulation of the chromatin landscape and epigenetic barriers that must be overcome for cardiac regeneration to occur. OBJECTIVE To identify transcription factor regulators of the chromatin landscape, which promote cardiomyocyte regeneration in zebrafish, and investigate their function. METHODS AND RESULTS Using the Assay for Transposase-Accessible Chromatin coupled to high-throughput sequencing (ATAC-Seq), we first find that the regenerating cardiomyocyte chromatin accessibility landscape undergoes extensive changes following cryoinjury, and that activator protein-1 (AP-1) binding sites are the most highly enriched motifs in regions that gain accessibility during cardiac regeneration. Furthermore, using bioinformatic and gene expression analyses, we find that the AP-1 response in regenerating adult zebrafish cardiomyocytes is largely different from the response in adult mammalian cardiomyocytes. Using a cardiomyocyte-specific dominant negative approach, we show that blocking AP-1 function leads to defects in cardiomyocyte proliferation as well as decreased chromatin accessibility at the fbxl22 and ilk loci, which regulate sarcomere disassembly and cardiomyocyte protrusion into the injured area, respectively. We further show that overexpression of the AP-1 family members Junb and Fosl1 can promote changes in mammalian cardiomyocyte behavior in vitro. CONCLUSIONS AP-1 transcription factors play an essential role in the cardiomyocyte response to injury by regulating chromatin accessibility changes, thereby allowing the activation of gene expression programs that promote cardiomyocyte dedifferentiation, proliferation, and protrusion into the injured area.
Collapse
Affiliation(s)
- Arica Beisaw
- From the Department of Developmental Genetics (A.B., J.D., C.-C.W., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.,German Centre for Cardiovascular Research (DZHK) Partner Site Rhine-Main (A.B., S.G., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Carsten Kuenne
- ECCPS Bioinformatics and Deep Sequencing Platform (C.K., S.G., M.B., M.L.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Stefan Guenther
- ECCPS Bioinformatics and Deep Sequencing Platform (C.K., S.G., M.B., M.L.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.,German Centre for Cardiovascular Research (DZHK) Partner Site Rhine-Main (A.B., S.G., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Julia Dallmann
- From the Department of Developmental Genetics (A.B., J.D., C.-C.W., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Chi-Chung Wu
- From the Department of Developmental Genetics (A.B., J.D., C.-C.W., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Mette Bentsen
- ECCPS Bioinformatics and Deep Sequencing Platform (C.K., S.G., M.B., M.L.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Mario Looso
- ECCPS Bioinformatics and Deep Sequencing Platform (C.K., S.G., M.B., M.L.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Didier Y R Stainier
- From the Department of Developmental Genetics (A.B., J.D., C.-C.W., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.,German Centre for Cardiovascular Research (DZHK) Partner Site Rhine-Main (A.B., S.G., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| |
Collapse
|
47
|
Boudra R, Ramsey MR. Understanding Transcriptional Networks Regulating Initiation of Cutaneous Wound Healing. THE YALE JOURNAL OF BIOLOGY AND MEDICINE 2020; 93:161-173. [PMID: 32226345 PMCID: PMC7087049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The epidermis has an essential function in creating a barrier against the external environment to retain proper fluid balance and block the entry of pathogens. When damage occurs to this barrier, the wound must quickly be sealed to avoid fluid loss, cleared of invading pathogens, and then keratinocytes must re-form an intact barrier. This requires complex integration of temporally and spatially distinct signals to execute orderly closure of the wound, and failure of this process can lead to chronic ulceration. Transcription factors serve as a key integration point for the myriad of information coming from the external environment, allowing for an orderly process of re-epithelialization. Importantly, transcription factors engage with and alter the chromatin structure around key target genes through association with different chromatin-modifying complexes. In this review, we will discuss the current understanding of how transcription is regulated during the initiation of re-epithelialization, and the exciting technological advances that will allow for a more refined mechanistic understanding of the re-epithelialization process.
Collapse
Affiliation(s)
- Rafik Boudra
- Brigham and Women’s Hospital Department of Dermatology, Boston, MA,Harvard Medical School, Boston, MA
| | - Matthew R. Ramsey
- Brigham and Women’s Hospital Department of Dermatology, Boston, MA,Harvard Medical School, Boston, MA,To whom all correspondence should be addressed: Matthew R. Ramsey, PhD, Brigham and Women’s Hospital, 77 Ave Louis Pasteur, HIM 668, Boston, MA 02115; Tel: (617) 525-5775, Fax: (617) 525-5571,
| |
Collapse
|
48
|
Hu S, Huo D, Yu Z, Chen Y, Liu J, Liu L, Wu X, Zhang Y. ncHMR detector: a computational framework to systematically reveal non-classical functions of histone modification regulators. Genome Biol 2020; 21:48. [PMID: 32093739 PMCID: PMC7038559 DOI: 10.1186/s13059-020-01953-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Accepted: 02/06/2020] [Indexed: 01/02/2023] Open
Abstract
Recently, several non-classical functions of histone modification regulators (HMRs), independent of their known histone modification substrates and products, have been reported to be essential for specific cellular processes. However, there is no framework designed for identifying such functions systematically. Here, we develop ncHMR detector, the first computational framework to predict non-classical functions and cofactors of a given HMR, based on ChIP-seq data mining. We apply ncHMR detector in ChIP-seq data-rich cell types and predict non-classical functions of HMRs. Finally, we experimentally reveal that the predicted non-classical function of CBX7 is biologically significant for the maintenance of pluripotency.
Collapse
Affiliation(s)
- Shengen Hu
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Sciences and Technology, Tongji University, Shanghai, 200092 China
| | - Dawei Huo
- Department of Cell Biology, Tianjin Medical University, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, Qixiangtai Road 22, Tianjin, China
- Department of Neurosurgery, Tianjin Medical University General Hospital, Tianjin, China
| | - Zhaowei Yu
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Sciences and Technology, Tongji University, Shanghai, 200092 China
| | - Yujie Chen
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Sciences and Technology, Tongji University, Shanghai, 200092 China
| | - Jing Liu
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Sciences and Technology, Tongji University, Shanghai, 200092 China
- Present address: Key Laboratory of Forensic Genetics, National Engineering Laboratory for Forensic Science, Institute of Forensic Science, Beijing, China
| | - Lin Liu
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA USA
| | - Xudong Wu
- Department of Cell Biology, Tianjin Medical University, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, Qixiangtai Road 22, Tianjin, China
- Department of Neurosurgery, Tianjin Medical University General Hospital, Tianjin, China
- State Key Laboratory of Experimental Hematology, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, 300020 China
| | - Yong Zhang
- Institute for Regenerative Medicine, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Sciences and Technology, Tongji University, Shanghai, 200092 China
| |
Collapse
|
49
|
Xu T, Zheng X, Li B, Jin P, Qin Z, Wu H. A comprehensive review of computational prediction of genome-wide features. Brief Bioinform 2020; 21:120-134. [PMID: 30462144 PMCID: PMC10233247 DOI: 10.1093/bib/bby110] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 10/15/2018] [Accepted: 10/16/2018] [Indexed: 12/15/2022] Open
Abstract
There are significant correlations among different types of genetic, genomic and epigenomic features within the genome. These correlations make the in silico feature prediction possible through statistical or machine learning models. With the accumulation of a vast amount of high-throughput data, feature prediction has gained significant interest lately, and a plethora of papers have been published in the past few years. Here we provide a comprehensive review on these published works, categorized by the prediction targets, including protein binding site, enhancer, DNA methylation, chromatin structure and gene expression. We also provide discussions on some important points and possible future directions.
Collapse
Affiliation(s)
- Tianlei Xu
- Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Ben Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Peng Jin
- Department of Human Genetics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Zhaohui Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| |
Collapse
|
50
|
Yang J, Ma A, Hoppe AD, Wang C, Li Y, Zhang C, Wang Y, Liu B, Ma Q. Prediction of regulatory motifs from human Chip-sequencing data using a deep learning framework. Nucleic Acids Res 2019; 47:7809-7824. [PMID: 31372637 PMCID: PMC6735894 DOI: 10.1093/nar/gkz672] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 07/23/2019] [Indexed: 11/24/2022] Open
Abstract
The identification of transcription factor binding sites and cis-regulatory motifs is a frontier whereupon the rules governing protein–DNA binding are being revealed. Here, we developed a new method (DEep Sequence and Shape mOtif or DESSO) for cis-regulatory motif prediction using deep neural networks and the binomial distribution model. DESSO outperformed existing tools, including DeepBind, in predicting motifs in 690 human ENCODE ChIP-sequencing datasets. Furthermore, the deep-learning framework of DESSO expanded motif discovery beyond the state-of-the-art by allowing the identification of known and new protein–protein–DNA tethering interactions in human transcription factors (TFs). Specifically, 61 putative tethering interactions were identified among the 100 TFs expressed in the K562 cell line. In this work, the power of DESSO was further expanded by integrating the detection of DNA shape features. We found that shape information has strong predictive power for TF–DNA binding and provides new putative shape motif information for human TFs. Thus, DESSO improves in the identification and structural analysis of TF binding sites, by integrating the complexities of DNA binding into a deep-learning framework.
Collapse
Affiliation(s)
- Jinyu Yang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA.,Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX 76010, USA
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Adam D Hoppe
- Department of Chemistry and Biochemistry, South Dakota State University, Brookings, SD 57007, USA.,BioSNTR, Brookings, SD 57007, USA
| | - Cankun Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Yang Li
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Chi Zhang
- Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, IN 46202, USA
| | - Yan Wang
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|